James Nurdin
Email: 2570809n@student.gla.ac.uk
Office: Room 320, Level 3, School of Computing Science, Sir Alywn Williams Building, University of Glasgow, Glasgow, G12 8QQ
https://orcid.org/0009-0008-1454-5198
Research title: Optimising Data Services for Sustainability Using Machine Learning-Based Forecasting
Research Summary
Research Abstract
Barclays currently operates a complex network of data services, comprising a range of large heterogenous databases, key-value stores, object stores and more. Indeed, one example data store holds over 6TB of data over 75 collections, with ingestion and consumption 24/7. Data is also queried around 10k times a day using over 25 APIs, which require a response within 600ms to meet service level agreements. However, this data is stored in a way that does not lend itself to analytics use cases, resulting in frequent data duplication, as there is then a need to create additional data stores catered to each analytics and data visualisation task. As a result, there are a range of opportunities to optimise Barclays data infrastructure to enable more efficient analytics and operational use cases, thereby reducing Barclay's data footprint and hence the energy required to store and process that data. With an ever-increasing amount of data and use cases resulting from machine learning, this is paramount to ensure technology stack sustainability.
One key advantage that Barclays enjoys is availability of deep logs regarding the usage of this data infrastructure. Further, Barclays keeps integration patterns to understand the use cases of the data, allowing for the identification and modelling of both analytics and operational use cases. Combined, these unique data points have the potential to be used to model optimised data storage solutions, enabling 1) reduced data duplication; and 2) next generation data structures that are suitable for both analytics and operational needs, which to our knowledge do not exist currently.
The core topic of the PhD is investigating ways to smartly use this existing log data regarding data usage within Barclays to optimise their data infrastructure.