SIPHER Synthetic Population - video - An animated video describing how we have created our synthetic population and how we will use it to understand how proposed policy changes might affect people in different ways.
Our Synthetic Population is a novel and unique dataset that provides a 'digital twin' enriched with a substantial amount of associated information representing the adult population in Scotland, England, and Wales - over 52 million people.
The SIPHER Synthetic Population for Individuals in Great Britain was created to support our policy partners and involved combining publicly available datasets including the UK Household Longitudinal Study (Understanding Society). The desire is that the resulting simulated “artificial” population reflects as many of the key attributes of the real population as possible.
Access to high-quality data for individuals within a population is crucial for research and policy makers. It allows the identification of emerging issues and needs, and the assessment of policy impact.
However, detailed information on individuals, including their health or employment status, is often only available via national safe havens. These datasets have strict entry requirements and with long lead times for applications do not typically allow for swift access and analysis. While commissioning survey data can provide an alternative, the sample size is often restricted and limits opportunities to study geographical areas and make direct comparisons.
To overcomes these limitations the SIPHER Synthetic Population was developed by our Data & System Monitoring - WS3 and is now available as part of the UK Data Service curated collection.
To further improve access to this resource, SIPHER has also developed an interactive dashboard. This tool allows exploration of an aggregated version of the dataset without any need for coding or data preparation. Its ‘click and explore’ format enables users to compare areas of interest, create bespoke detailed area profiles, develop customised data visualisations, and download the aggregate data used.
Together these resources provide researchers and policymakers with a powerful tool to explore and test policy options. This will significantly enhance research capabilities and help inform evidence-based policy decisions.
SIPHER Synthetic Population Dataset
- SIPHER Synthetic Population for Individuals in Great Britain, 2019-2021 (UK Data Service Curated Collection, SN9277)
- User Guide (UK Data Service part of SN9277 Documentation)
- SIPHER Synthetic Population for Individuals in Great Britain, 2019-2021 – Supplementary Material (ReShare
Interactive ‘click and explore’ tool
Resources
- SIPHER Synthetic Population Dataset
- SIPHER Synthetic Population for Individuals in Great Britain, 2019-2021 (UK Data Service Curated Collection, SN9277)
- User Guide (UK Data Service part of SN9277 Documentation)
- SIPHER Synthetic Population for Individuals in Great Britain, 2019-2021 – Supplementary Material (ReShare)
- SIPHER Synthetic Population Dashboard – Interactive ‘click and explore’ tool - without any requirement for coding or data preparation.
- SHOWCASE presentation video - A synthetic represenataive population for policy modelling - March 2025
- Understanding Society - UK Household Longitudinal Study - underpinning data source for the Synthetic Population.
- UK Data Service website - To create the Synthetic Population, Understanding Society survey data and small-area census information are required. Understanding Society survey data can be downloaded from the UK Data Service.
- Flexible Modelling Framework which is used to create synthetic populations - hosted on GitHub
- SIPHER Glossary For clarification of our terminology and use of acronyms.
Publications:
- Rice, H.P., Hoehn, A., Meier, P. et al. An inclusive economy dataset for wards in Great Britain using administrative and synthetic data sources. Nature Scientific Data 12, 1230 (2025). https://doi.org/10.1038/s41597-025-05502-x
-
Creating Data for Entire Populations SIPHER Blog 15 October 2024
-
Wu, G., Heppenstall, A. , Meier, P. , Purshouse, R. and Lomax, N. (2022) A synthetic population dataset for estimating small area health and socio-economic outcomes in Great Britain. Scientific Data, (doi: 10.1038/s41597-022-01124-9)
News
- New Interactive Dashboard
In July 2024 SIPHER launched the SIPHER Synthetic Population Dashboard allowing exploration of an aggregated version of the SIPHER Synthetic Population for Individuals in Great Britain 2019-2021 - without the need for any coding or data preparation. This exciting new tool offers researchers & policymakers the opportunity to create bespoke detailed area profiles and customised data visualisations.
- Launch on UK Data Service
The SIPHER Synthetic Population is now available for full independent use as part of the UK Data Service curated collection. Our unique resource offers a 'digital twin' of over 52 million individuals in Great Britain, by combining various data sources including the UK Household Longitudinal Study (Understanding Society) dataset.
Read: Building synthetic population data – new research available from the UK Data Service
- Welsh Government & Public Health Wales Workshop, Cardiff
In March 2024 we held a successful SIPHER Synthetic Population Workshop for the Welsh Government and Public Health Wales in Cardiff. Participants rated the course highly noting “really interesting session and offered great scope for some future health analysis projects in Wales.” Led by Andreas Hoehn SIPHER Research Associate, the half-day event equipped attendees with the necessary knowledge and skills to independently navigate this innovative dataset. Follow up plans look to support the active policy development process within the Welsh Government.
- CECAN Webinar
Nik Lomax SIPHER Co-Investigator and Co-Lead on our Data & System Monitoring and Policy Microsimulation workstrands presented an Introduction to our Synthetic Population Dataset in a Centre for the Evaluation of Complexity Across the Nexus (CECAN) Webinar on 28 February 2024.
Watch: SIPHER Synthetic Population: An Introduction plus Q&A
- Introductory Workshop, Glasgow
In December 2023 we held a half-day introductory workshop on the SIPHER Synthetic Population for researchers at Glasgow City Council and the University of Glasgow.
This successful session allowed everyone to get “hands on” with this unique data set which provides a “digital twin” for the adult population in Scotland, England, and Wales, approximately 55 million individuals.
Techincal Information
Provides technical details of the characteristics including strengths and limitations for this data set.
| Purpose | A quality-controlled, public available data source containing attribute-rich data at the individual level - with the aim to create a digital twin for every adult in the population with a large amount of associated information about each person. |
| Context | Individual level data enable us to understand an individuals’ situations, what happens to them over time or when affected by changes due to external events or policies. The lack of a comprehensive register-based system in Great Britain has made it challenging to access data on individuals across multiple domains. The SIPHER Synthetic Population helps bridging this gap by providing a representative, attribute-rich dataset reflecting the whole of the adult population in Great Britain. By randomly selecting individuals from a survey and assigning them to small geographical areas based on census statistics, the SIPHER Synthetic Population ensures that the distribution of demographic characteristics for all sampled individuals corresponds exactly to the true demographic structure within each small census output area. This enables researchers to derive area-level profiles which would otherwise not be available. In more complex applications, the dataset can be used to simulate policy interventions and explore their potential impact on individuals and households at a granular resolution, distinguishing small geographical areas such and even population subgroups within these areas. |
| Strengths | The SIPHER Synthetic Population is representative of the demographic characteristics of the respective area - down to a low geographical resolution. The strength of the SIPHER Synthetic Population is that it provides a wide range of information at the level of individuals. This information can be aggregated into groupings of interest (e.g. sex, income groups) and particular geographical units of interest (LSOA/DZ; MSOA; Local Authorities etc.). The method used to develop the dataset is referred to as spatial microsimulation. We often use the SIPHER Synthetic Population in conjunction with other models we have developed. This enables us to determine whether an intervention has benefitted a population group of interest. |
| Limitations | The accuracy of the SIPHER Synthetic Population depends on the quality and availability of the underlying data. Some variables may have poor completion rates in the underlying survey, resulting in missing data after linkage. Despite the high number of participants in the Understanding Society survey, explicit spatial constraints cannot be applied when creating the datasaet. This means that an individual who was interviewed as part of the survey and who is actually residing in place X can be assigned to a variety of places A, B, and C, as long as they match the demographic constraints such as age, sex, marital status etc. Although recent updates of the code have led to more constraints on how to perform this selection process, it is important to remember that the creation of the SIPHER Synthetic Population is based on associations and descriptive statistics. It can only ever serve as an approximation of the true population in Scotland, England and Wales - which is likely to be much more heterogenous and diverse than the population captured in the synthetic data source. Therefore, all results obtained from the SIPHER Synthetic Population should always be interpreted carefully as model output, and not as equivalent to a population-based register. |
| Geography | Individuals in the SIPHER Synthetic Population have a geography assigned to them (a synthetic DZ/LSOA). This allows all levels of geography upwards from DZ/LSOA Level for Scotland, England and Wales - excluding Northern Ireland - to be analysed and modelled. |
| Variables / Indicators | A large variety of variables can be included. This includes all variables included in the Understanding Society survey - the underlying survey data source. It also possible to estimate other derived variables from this data source, for example ‘Equivalent Income’, using the ‘Equivalent Income Calculator’ method. |
| Time Period | The latest release reflects the years 2019-2021. Results from the UK census 2011 are used as constraints for the spatial microsimulation - the process generating the Synthetic Population. Preliminary updated version for England and Wales are available which are based on the UK census 2021. However, Scotland has not yet published all required input data from its most recent census. |
| Missing Data | The level of missing information for a particular variable is determined by the levels of missingness in the underlying Understanding Society survey. |
| Examples / Link with Other Models and Data | The Synthetic Population is used as the underlying data source in several SIPHER models. These include: (1) dynamic systems model, (2) static and dynamic microsimulation and (3) decision support tool. Information covered in the Synthetic Population can be extended by adding additional variables from other data sources. These could be datasets that are not publicly available. In addition, the SIPHER Synthetic Population can be used to derive more complex concepts such as the ‘Equivalent Income’ - a variable which is calculated using the ‘Equivalent Income Calculator’ method. |
| Software Requirements | Requires a software that can handle the size of the data file, such as R or Python. An interactive Rshiny dashboard allows a code-free exploration of an aggregated version: https://sipherdashboard.sphsu.gla.ac.uk/ |
| Data Requirements / Restrictions | The SIPHER Synthetic Population is available for full indeopendent use via the UK Data Service’s Curated Data Collection. To set up the SIPHER Synthetic Population, it is required to link the synthetic population file (UK Data Service ID: SN9277) with Understanding Society survey data (UK Data Service ID: SN6614) - as is typically done for area-level linkages of surveys. Both datasets are subject to the General End-User License Agreement terms and conditions, and can be downloaded without any costs directly from the website of UK Data Service. |
| Data / Code Available | Due to the underlying license agreement, the dataset cannot be shared as an open access version. However, the dataset can be downloaded through the UK Data Service website, after acceptance of the General End-User license terms and conditions: https://beta.ukdataservice.ac.uk/datacatalogue/studies/study?id=9277#!/details In addition, we have made a wealth of supplementary material available, documenting creation, validation, linkage, and exploration of the dataset: https://reshare.ukdataservice.ac.uk/856754/ |
| Training | A comprehensive, open access User Guide for our SIPHER Synthertic Population provides background information and explains how to setup up the data and analyse it swiftly: https://doc.ukdataservice.ac.uk/doc/9277/mrdoc/pdf/9277_user_guide_r4_clean.pdf |