Data & System Monitoring
What is this workstrand about?
Workstrand 3 is focused on getting data into the models that SIPHER is using.
The main tasks of Workstrand 3 are centered on:
- Identifying relevant data and obtaining access to data for use by SIPHER;
- Looking after the data, ensuring it is managed and stored securely in line with GDPR principles and our data sharing agreements with policy partners
- Ensuring data is of good quality and linking it to other data sets where appropriate; and,
- Creating “synthetic populations” by combining policy partner data with publicly available datasets. The computer simulated “artificial” population of Sheffield, Greater Manchester, and of Scotland were created to reflect as many of the key attributes of the real population as possible.
What does it involve?
Working with the embedded researchers in Sheffield City Council, Greater Manchester Combined Authority, and Scottish Government, we explore the types of data held by these organisations, and where appropriate, securely manage the data for use by colleagues across the SIPHER Consortium.
This audit process has identified the kinds of data available for specific areas of our research, e.g. health inequalities, socioeconomic conditions or housing, as well as information on how this data can be accessed. The audit also provided insights into data gaps, and specifically where alternative data sets may be required to fulfill our needs.
Workstrand 3 has documented all of the disparate and diverse data which are available from national repositories and from our policy partners. Through rigorous data checks, they ensure that the data used is anonymous and meets the highest quality standards. Data identified through the data audit has been collated and warehoused in the Leeds Institute for Data Analytics.
An automated monitoring processes is ued to identify new data releases and update data sets regularly to ensure access to the latest information. This allows new data sets to be embed within the models as soon as they are available.
This process also enables examination of any trends in data with each data update compared with previous versions. This can be helpful in identifying changes in population characteristics for example the number of people who register for specific services in a given area.
The models in SIPHER Workstrand 5 draw on a single dataset that has very detailed data about each individual living in our three policy partner areas, including information relevant to our policy topics such as age, gender, income, family relationships, physical and mental health, current employment, housing situation and so on.
Unsurprisingly, such comprehensive datasets do not exist. So, we are using established microsimulation modelling methods to create novel, synthetic datasets by bringing together data held by policy partners and other existing sources such as the Census and government surveys.
Synthetic data retains the characteristics of the original data and maintains the relationship between variables (e.g. one dataset may tell us about the characteristics of employed and unemployed people in Sheffield, another may have information about people living with mental illness, and a third may have a lot of detail about types of housing). These synthetic populations are key inputs in to both the macro modelling in Workstrand 4 and micro modelling in Workstrand 5.
What is it achieving?
Workstrand 3 has established an infrastructure for procuring and storing data required by the SIPHER Consortium. It has produce attribute rich synthetic datasets with variables that are of particular interest for partner organisations. The SIPHER Synthetic Population represents all the characteristics and relationships within real data sets but maintain individual anonymity.