Transforming personal data

Our research into experimental data is helping decision-making for national and international evidence-based policies. In addition, our work provided vital information for the Scottish Government's response to the COVID pandemic.

Image
Medical examination and growth graph data of business on tablet with doctor's health report clipboard and laptop in background.

Integration of administrative and population data sources such as medical records and household composition is pivotal for evidence-based policymaking.

Authorised use of linked whole population data, though potentially extremely powerful, has been limited in many settings, nationally and internationally, by informational gaps, data protection regulations and privacy laws. 

Through Administrative Data Research Scotland (ADR Scotland), we have established a series of foundational research programmes enabling the introduction of novel data access and linkage approaches while ensuring data safety through anonymisation and security measures.  This provides a fuller integrated use of routinely collected health and social administrative data.

Led by Professor Chris Dibben, our work included developing a set of measures enabling enhanced, legally compliant and secure access to sensitive personal data for professional training purposes and evidence-based policymaking.  As a result, these innovations have enabled training the next generation of professional information and data scientists, equipped with novel data analysis forums. 

Our underpinning research into experimental data has informed evidence-based policy by UK government agencies, police forces and the National Health Service.  In addition, our work on grouping populations into 'households', notably care homes, proved critical in influencing the Scottish Government's response to the COVID pandemic.

Administrative Data Research Scotland (ADR Scotland)

ADR Scotland is a partnership combining specialists in the Scottish Government's Data Sharing and Linkage Unit with academic researchers' expertise at the Scottish Centre for Administrative Data Research (SCADR).  Together, we are transforming how public sector data in Scotland is curated, accessed, and explored to deliver its full potential for policymakers and the public.  

Professor Chris Dibben is Co-Director for ADR and Director for SCADR.

Highlights

We have provided some highlights on how we are transforming access to sensitive personal data, including enhancing data infrastructure, access and capacity for policy development.

Through ADR Scotland, we have developed the novel legal concept of functional anonymisation.  For a dataset containing personal information, its treatment is enabled as legally anonymous if it is 'not reasonably likely' rather than 'not possible' for a person's identity to be deduced from the data. Hence very detailed information can be released to analysts legally.

We have also developed a novel concept of 'embassy' micro-safe infrastructures such as SafePods.  These provide a fully controlled and consistent environment for data analysis remote from the main data centres but controlled by those centres – allowing the data to be treated as functionally anonymous because of the controlled environment.

The concept of functional anonymisation has been used as the core legal justification for releasing data for analysis in Wales, England and Scotland. It is a cornerstone concept for advisory services such as the UK Anonymisation Network and the EU funded 'Data Pitch' which works with corporates and public sector organisations to use shared data to build sustainable businesses that generate economic, social and environmental impact.

'Embassy safe research spaces' have been developed for national organisations such as the UK Office for National Statistics (ONS) and the Welsh Secure Anonymised Information Databank. The Economic and Social Research Council (ESRC) UK Data Service has significantly extended the locations where sensitive data can be accessed, with over 25 locations now across the UK.  

SafePod network

The SafePod Network is a new service designed to improve data research and access for the public benefit by providing and operating a network of independent, safe settings (known as SafePods).  

As SafePods are rolled out across the country, travel times are significantly reduced, and accessibility and capacity for a wide range of policy-related research are increasing. This means much more analysis is now possible by the national organisations.

You can find out more about the SafePod network on the Scottish Centre for Administrative Data Research (SCADR) website.

SafePod network

During the COVID-19 pandemic, it has been paramount for government and health agencies to have detailed information quickly. To understand transmission, it was especially important to understand who was living together. 

Our efforts in developing tools to enable administrative data research, such as linking medical data to property locations, have enabled a better understanding of settings, such as care homes or households, in turn enabling vital COVID-19 research and understanding for the Government.

With ADR Scotland, we have developed novel methods for assembling data so it can be used to explore key characteristics of populations. For example, to form an understanding of housing, families and households (which are not recorded in UK administrative data), we developed the CHI-UPRN Residential Linkage (CURL) tool, which links Community Health Index and Unique Property Reference numbers. The entire Scottish population was probabilistically linked to their exact residence.  This enabled people to be grouped into 'households' and the nature of these to be understood, such as whether it is a care home.  Work using this proved critical in influencing the Scottish Government's response to the COVID pandemic.

The ADR Scotland measure of households, enabled by the CURL tool, meant that Public Health Scotland could provide Scottish Ministers with information on transmission from hospitals to care homes. The CURL tool was also used to inform the key Scottish Government report 'Discharges from NHS Scotland Hospitals to Care Homes'. 

Professor Chris Dibben was part of the Scottish Government’s initial COVID-19 Taskforce. He is also a member of their COVID-19 Data and Intelligence Network, helping to inform the Government’s data response to the pandemic.

We have developed state-of-the-art 'administrative data enhancements' that have led to a broader and more varied range of linked personal administrative data being made more widely available. Secure and private by design, these approaches have made this enhanced provision legally compliant. These innovations have been critical to the development of increased national-level provision of data in Scotland.  

These 'administrative data enhancements' are also being used in the wider UK and internationally, including national statistical and government agencies such as the UK Office for National Statistics (ONS) and Statistics Canada.   Collectively, many of our approaches have enhanced the capability and capacity for research within and for governments at a multi-national level.

Some examples of our underpinning research, led by Professor Dibben, includes:

 eDatashield

A capacity to combine datasets is crucial. This increases the size of research datasets and enables comparative research, but legal restrictions can prevent agencies sharing data across borders. 

A statistical process and software programme has been developed to allow remote and non-disclosive analyses of sensitive data to be carried out via the eDATASHIELD protocol. The protocol exchanges non-disclosive summaries of statistics between agencies, making it possible for exact statistical results to be calculated.

The eDatashield approach has been used across the UK to allow comparative research previously proscribed by law. For the first time, this has allowed research across all three of the UK's Census Longitudinal Studies. For example, allowing the Scottish Government's Glasgow Centre for Population Health in 2016 to better understand ill-health in Glasgow through comparison to similar de-industrialising towns in England.

You can find out more about the eDatashield on the Scottish Centre for Administrative Data Research (SCADR) website.

eDatashield

Synthpop

Synthetic data allows the widespread release of otherwise sensitive data. It mimics the real data and preserves the relationships between variables but is safe to release because the data is ‘artificial’. Whilst a developing area in the literature, there were no software packages that could be used easily to implement these methods. We resolved a number of significant methodological issues and developed a new software package: ‘Synthpop’. This involved novel methods for making inference and for estimating the utility and privacy of the synthetic data.

The 'Synthpop' package considerably simplified producing safe and high utility synthetic versions of otherwise sensitive private data. It was made available to practitioners in 2014 and has been downloaded over 23,000 times across 129 different countries.  

Users of Synthpop include:

  • Institute for Employment Research, Germany
  • Labor Dynamic Institute, Cornell University
  • Open Source Policy Center, American Enterprise Institute, USA

Synthpop has also enabled creative engagement with data.   For example, Statistics Canada used it to produce an analytically-rich synthetic data file used during external 'codefest' events. One such 'codefest', run in collaboration with IBM, tested cloud-based tools and international teams produced new visualisation suites for Statistics Canada linked data. 

It is also used in the private sector, as the simplicity of Synthpop and the quality of the data generated make it a great resource for industry.

You can find out more about the Synthpop on the website.

Synthpop

 

Want to know more?

We've provided some useful links for you.  To see the information, simply click on each heading below: