Probabilistic linkage to enhance deterministic algorithms and reduce data linkage errors in hospital administrative data

Journal article


Hagger-Johnson, Gareth, Harron, Katie, Goldstein, Harvey, Aldridge, Rob and Gilbert, Ruth. (2017). Probabilistic linkage to enhance deterministic algorithms and reduce data linkage errors in hospital administrative data. Journal of Innovation in Health Informatics. 24(2), pp. 234 - 246. https://doi.org/10.14236/jhi.v24i2.891
AuthorsHagger-Johnson, Gareth, Harron, Katie, Goldstein, Harvey, Aldridge, Rob and Gilbert, Ruth
Abstract

Background The pseudonymisation algorithm used to link together episodes of care belonging to the same patient in England [Hospital Episode Statistics ID (HESID)] has never undergone any formal evaluation to determine the extent of data linkage error. Objective To quantify improvements in linkage accuracy from adding probabilistic linkage to existing deterministic HESID algorithms. Methods Inpatient admissions to National Health Service (NHS) hospitals in England (HES) over 17 years (1998 to 2015) for a sample of patients (born 13th or 28th of months in 1992/1998/2005/2012). We compared the existing deterministic algorithm with one that included an additional probabilistic step, in relation to a reference standard created using enhanced probabilistic matching with additional clinical and demographic information. Missed and false matches were quantified and the impact on estimates of hospital readmission within one year was determined. Results HESID produced a high missed match rate, improving over time (8.6% in 1998 to 0.4% in 2015). Missed matches were more common for ethnic minorities, those living in areas of high socio-economic deprivation, foreign patients and those with ‘no fixed abode’. Estimates of the readmission rate were biased for several patient groups owing to missed matches, which were reduced for nearly all groups. Conclusion Probabilistic linkage of HES reduced missed matches and bias in estimated readmission rates, with clear implications for commissioning, service evaluation and performance monitoring of hospitals. The existing algorithm should be modified to address data linkage error, and a retrospective update of the existing data would address existing linkage errors and their implications.

Keywordsdeterministic record linkage; evaluation; hospital discharge; probabilistic record linkage
Year2017
JournalJournal of Innovation in Health Informatics
Journal citation24 (2), pp. 234 - 246
PublisherBCS Learning and Development Limited
ISSN2058-4563
Digital Object Identifier (DOI)https://doi.org/10.14236/jhi.v24i2.891
Scopus EID2-s2.0-85042619898
Page range234 - 246
Research GroupInstitute for Learning Sciences and Teacher Education (ILSTE)
Publisher's version
License
File Access Level
Controlled
Place of publicationUnited Kingdom
Permalink -

https://acuresearchbank.acu.edu.au/item/8857q/probabilistic-linkage-to-enhance-deterministic-algorithms-and-reduce-data-linkage-errors-in-hospital-administrative-data

Restricted files

Publisher's version

  • 83
    total views
  • 0
    total downloads
  • 0
    views this month
  • 0
    downloads this month
These values are for the period from 19th October 2020, when this repository was created.

Export as

Related outputs

Enhanced use of educational accountability data to monitor educational progress of Australian students with focus on Indigenous students
Cumming, Joy, Goldstein, Harvey and Hand, Kirstine. (2020). Enhanced use of educational accountability data to monitor educational progress of Australian students with focus on Indigenous students. Educational Assessment, Evaluation and Accountability. 32, pp. 29-51. https://doi.org/10.1007/s11092-019-09310-x
Estimating reliability statistics and measurement error variances using instrumental variables with longitudinal data
Goldstein, Harvey, Haynes, Michele, Leckie, George and Tran, Phuong. (2020). Estimating reliability statistics and measurement error variances using instrumental variables with longitudinal data. Longitudinal and Life Course Studies. 11(3), pp. 289 - 306. https://doi.org/10.1332/175795920X15844303873216
Mindfulness-based intervention for educators: Effects of a school-based cluster randomized controlled study
Hwang, Yoon-Suk, Goldstein, Harvey, Medvedev, Oleg N., Singh, Nirbhay N., Noh, Jae-Eun and Hand, Kirstine Alicia. (2019). Mindfulness-based intervention for educators: Effects of a school-based cluster randomized controlled study. Mindfulness. 10(7), pp. 1417 - 1436. https://doi.org/10.1007/s12671-019-01147-1
A software package for the application of probabilistic anonymisation to sensitive individual-level data: A proof of principle with an example from the ALSPAC birth cohort study
Avraam, Demetris, Boyd, Andy, Goldstein, Harvey and Burton, Paul. (2018). A software package for the application of probabilistic anonymisation to sensitive individual-level data: A proof of principle with an example from the ALSPAC birth cohort study. Longitudinal and Life Course Studies. 9(4), pp. 433-446. https://doi.org/10.14301/llcs.v9i4.478
GUILD: guidance for information about linking data sets
Gilbert, Ruth, Lafferty, Rosemary, Hagger-Johnson, Gareth, Harron, Katie, Zhang, Li-Chun, Smith, Peter W.F., Dibben, Chris and Goldstein, Harvey. (2018). GUILD: guidance for information about linking data sets. Journal of Public Health. 40(1), pp. 191 - 198. https://doi.org/10.1093/pubmed/fdx037
Multilevel growth curve models that incorporate a random coefficient model for the level 1 variance function
Goldstein, Harvey, Leckie, George, Charlton, Christopher, Tilling, Kate and Browne, William J.. (2018). Multilevel growth curve models that incorporate a random coefficient model for the level 1 variance function. Statistical Methods in Medical Research. 27(11), pp. 3478 - 3491. https://doi.org/10.1177/0962280217706728
Bayesian models for weighted data with missing values: a bootstrap approach
Goldstein, Harvey, Carpenter, James and Kenward, Michael G.. (2018). Bayesian models for weighted data with missing values: a bootstrap approach. Journal of the Royal Statistical Society Series C: Applied Statistics. 67(4), pp. 1071 - 1081. https://doi.org/10.1111/rssc.12259
A guide to evaluating linkage quality for the analysis of linked data
Harron, Katie, Doidge, James C., Knight, Hannah E., Gilbert, Ruth, Goldstein, Harvey, Cromwell, David A. and van der Meulen, Jan H.. (2017). A guide to evaluating linkage quality for the analysis of linked data. International Journal of Epidemiology. 46(5), pp. 1699 - 1710. https://doi.org/10.1093/ije/dyx177
A Bayesian model for measurement and misclassification errors alongside missing data, with an application to higher education participation in Australia
Goldstein, Harvey, Browne, William J. and Charlton, Christopher. (2017). A Bayesian model for measurement and misclassification errors alongside missing data, with an application to higher education participation in Australia. Journal of Applied Statistics. 45(5), pp. 918 - 931. https://doi.org/10.1080/02664763.2017.1322558
A scaling approach to record linkage
Goldstein, Harvey, Harron, Katie and Cortina-Borja, Mario. (2017). A scaling approach to record linkage. Statistics in Medicine. 36(16), pp. 2514 - 2521. https://doi.org/10.1002/sim.7287
Utilising identifier error variation in linkage of large administrative data sources
Harron, Katie, Hagger-Johnson, Gareth, Gilbert, Ruth and Goldstein, Harvey. (2017). Utilising identifier error variation in linkage of large administrative data sources. BMC Medical Research Methodology. 17(1), pp. 1 - 9. https://doi.org/10.1186/s12874-017-0306-8
Challenges in administrative data linkage for research
Harron, Katie, Dibben, Chris, Boyd, James, Hjern, Anders, Azimaee, Mahmoud, Barreto, Mauricio L. and Goldstein, Harvey. (2017). Challenges in administrative data linkage for research. Big Data and Society. 4(2), pp. 1 - 12. https://doi.org/10.1177/2053951717745678
Integrating area-based and national samples in birth cohort studies: the case of life study
Goldstein, Harvey, Sera, Francesco, Elias, Peter and Dezateux, Carol. (2017). Integrating area-based and national samples in birth cohort studies: the case of life study. Longitudinal and Life Course Studies. 8(3), pp. 281 - 289. https://doi.org/10.14301/llcs.v8i3.439
The evolution of school league tables in England 1992-2016: 'contextual value-added’, ‘expected progress’ and ‘progress 8’
Leckie, George and Goldstein, Harvey. (2017). The evolution of school league tables in England 1992-2016: 'contextual value-added’, ‘expected progress’ and ‘progress 8’. British Educational Research Journal. 43(2), pp. 193 - 212. https://doi.org/10.1002/berj.3264
Handling attrition and non-response in longitudinal data with an application to a study of Australian youth
Cumming, Jacqueline Joy and Goldstein, Harvey. (2016). Handling attrition and non-response in longitudinal data with an application to a study of Australian youth. Longitudinal and Life Course Studies. 7(1), pp. 53 - 63. https://doi.org/10.14301/llcs.v7i1.342
Record linkage
Goldstein, Harvey and Harron, Katie. (2016). Record linkage. In In K. Harron, H. Goldstein and C. Dibben (Ed.). Methodological developments in data linkage John Wiley & Sons.
Trends in examination performance and exposure to standardised tests in England and Wales
Goldstein, Harvey and Leckie, George. (2016). Trends in examination performance and exposure to standardised tests in England and Wales. British Educational Research Journal. 42(3), pp. 367 - 375. https://doi.org/10.1002/berj.3220
Interviewer effects on non-response propensity in longitudinal surveys : A multilevel modelling approach
Vassallo, Rebecca, Durrant, Gabriele, Smith, Peter and Goldstein, Harvey. (2015). Interviewer effects on non-response propensity in longitudinal surveys : A multilevel modelling approach. Royal Statistical Society. Journal. Series A: Statistics in Society. 178(1), pp. 83 - 99. https://doi.org/10.1111/rssa.12049
A multilevel modelling approach to measuring changing patterns of ethnic composition and segregation among London secondary schools, 2001-2010
Leckie, George and Goldstein, Harvey. (2015). A multilevel modelling approach to measuring changing patterns of ethnic composition and segregation among London secondary schools, 2001-2010. Royal Statistical Society. Journal. Series A: Statistics in Society. 178(2), pp. 405 - 424. https://doi.org/10.1111/rssa.12066
Validity, science and educational measurement
Goldstein, Harvey. (2015). Validity, science and educational measurement. Assessment in Education: Principles, Policy & Practice. 22(2), pp. 193 - 201. https://doi.org/10.1080/0969594X.2015.1015402
Population sampling in longitudinal surveys
Goldstein, Harvey, Lynn, Peter, Muniz-terrera, Graciela, Hardy, Rebecca, O'Muircheartaigh, Colm, Skinner, Chris and Lehtonen, Risto. (2015). Population sampling in longitudinal surveys. Longitudinal and Life Course Studies (online). 6(4), pp. 447 - 452. https://doi.org/10.14301/llcs.v6i4.345
After the RCT : Who comes to a family-based intervention for childhood overweight or obesity when it is implemented at scale in the community?
Fagg, James, Cole, Tim, Cummins, Steven, Goldstein, Harvey, Morris, Stephen, Radley, Duncan, Sacher, Paul and Law, Catherine. (2015). After the RCT : Who comes to a family-based intervention for childhood overweight or obesity when it is implemented at scale in the community? Journal of Epidemiology and Community Health. 69(2), pp. 142 - 148. https://doi.org/10.1136/jech-2014-204155
Data linkage errors in hospital administrative data when applying a pseudonymisation algorithm to paediatric intensive care records
Hagger-Johnson, Gareth, Harron, Katie, Fleming, Tom, Gilbert, Ruth, Goldstein, Harvey, Landy, Rebecca and Parslow, Roger. (2015). Data linkage errors in hospital administrative data when applying a pseudonymisation algorithm to paediatric intensive care records. BMJ Open. 5(8), pp. 1 - 8. https://doi.org/10.1136/bmjopen-2015-008118
Identifying possible false matches in anonymized hospital administrative data without patient identifiers
Hagger-Johnson, Gareth, Harron, Katie, Gonzallez-Izquierdo, Arturo, Cortina-Borja, Mario, Dattani, Nirupa, Muller-Pebody, Berit, Parslow, Roger, Gilbert, Ruth and Goldstein, Harvey. (2015). Identifying possible false matches in anonymized hospital administrative data without patient identifiers. Health Services Research. 50(4), pp. 1162 - 1178. https://doi.org/10.1111/1475-6773.12272
Evaluating bias due to data linkage error in electronic healthcare records
Harron, Katie, Wade, Angie, Gilbert, Ruth, Muller-Pebody, Berit and Goldstein, Harvey. (2014). Evaluating bias due to data linkage error in electronic healthcare records. BMC Medical Research Methodology. 14(1), pp. 1 - 10. https://doi.org/10.1186/1471-2288-14-36
Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non-linear terms
Goldstein, Harvey, Carpenter, James and Browne, William. (2014). Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non-linear terms. Journal of the Royal Statistical Society Series A: Statistics in Society. 177(2), pp. 553 - 564. https://doi.org/10.1111/rssa.12022
Panel attrition : How important is interviewer continuity?
Lynn, Peter, Kaminska, Olena and Goldstein, Harvey. (2014). Panel attrition : How important is interviewer continuity? Journal of Official Statistics. 30(3), pp. 443 - 457. https://doi.org/10.2478/JOS-2014-0028
Using league table rankings in public policy formation : Statistical issues
Goldstein, Harvey. (2014). Using league table rankings in public policy formation : Statistical issues. Annual Review of Statistics and Its Application. 1, pp. 385 - 399.
Modelling survival and mortality risk to 15 years of age for a national cohort of children with serious congenital heart defects diagnosed in infancy
Knowles, Rachel L., Bull, Catherine, Wren, Christopher, Wade, Angela, Goldstein, Harvey and Dezateux, Carol. (2014). Modelling survival and mortality risk to 15 years of age for a national cohort of children with serious congenital heart defects diagnosed in infancy. PLoS ONE. 9(8), pp. 1 - 15. https://doi.org/10.1371/journal.pone.0106806
From trial to population: A study of a family-based community intervention for childhood overweight implemented at scale
Fagg, James, Chadwick, P., Cole, Tim, Cummins, Steven, Goldstein, Harvey, Lewis, H., Morris, Sue, Radley, Duncan, Sacher, Paul and Law, Catherine. (2014). From trial to population: A study of a family-based community intervention for childhood overweight implemented at scale. International Journal of Obesity. 38(10), pp. 1343 - 1349. https://doi.org/10.1038/ijo.2014.103
Using league table rankings in public policy formation: Statistical issues
Goldstein, Harvey. (2014). Using league table rankings in public policy formation: Statistical issues. Annual Review of Statistics and Its Application. 1(1), pp. 385 - 399. https://doi.org/10.1146/annurev-Statistics-022513-115615
Knowledge and numbers in education
Goldstein, Harvey and Moss, Gemma. (2014). Knowledge and numbers in education. Comparative Education. 50(3), pp. 259 - 265. https://doi.org/10.1080/14681366.2014.926138
Adjusting for differential misclassification in multilevel models: The relationship between child exposure to smoke and cognitive development
Ferrao, Maria and Goldstein, Harvey. (2014). Adjusting for differential misclassification in multilevel models: The relationship between child exposure to smoke and cognitive development. Quality and Quantity (Print). 48(1), pp. 251 - 258. https://doi.org/10.1007/s11135-012-9765-5
University mission creep? Comparing EU and US faculty views of university involvement in regional economic development and commercialization
Goldstein, Harvey, Bergman, Edward M. and Maier, Gunther. (2013). University mission creep? Comparing EU and US faculty views of university involvement in regional economic development and commercialization. The Annals of Regional Science. 50(2), pp. 453 - 477. https://doi.org/10.1007/s00168-012-0513-5
Evaluating educational changes: A statistical perspective
Goldstein, Harvey. (2013). Evaluating educational changes: A statistical perspective. Ensaio: Avaliacao e Politicas Publicas em Educacao. 21(78), pp. 101 - 114. https://doi.org/10.1590/S0104-40362013005000002
Linkage, Evaluation and Analysis of National Electronic Healthcare Data : Application to Providing Enhanced Blood-Stream Infection Surveillance in Paediatric Intensive Care
Harron, Katie, Goldstein, Harvey, Wade, Angie, Muller-Pebody, Berit, Parslow, Roger and Gilbert, Ruth. (2013). Linkage, Evaluation and Analysis of National Electronic Healthcare Data : Application to Providing Enhanced Blood-Stream Infection Surveillance in Paediatric Intensive Care. PLoS ONE. 8(12), pp. 1 - 11. https://doi.org/10.1371/journal.pone.0085278
Linkage, evaluation and analysis of National Electronic Healthcare Data : Application to providing enhanced blood-stream infection surveillance in paediatric intensive care
Harron, Katie, Goldstein, Harvey, Wade, Angie, Muller-Pebody, Berit, Goldstein, Harvey, Parslow, Roger and Gilbert, Ruth. (2013). Linkage, evaluation and analysis of National Electronic Healthcare Data : Application to providing enhanced blood-stream infection surveillance in paediatric intensive care. PLoS One (online). 8(12), pp. 1 - 11. https://doi.org/10.1007/s00134-013-2841-z
Risk-adjusted monitoring of blood-stream infection in paediatric intensive care : A data linkage study
Harron, Katie, Wade, Angie, Muller-Pebody, Berit, Goldstein, Harvey, Parslow, Roger, Gray, Jim, Hartley, John, Mok, Quen and Gilbert, Ruth. (2013). Risk-adjusted monitoring of blood-stream infection in paediatric intensive care : A data linkage study. Intensive Care Medicine. 39(6), pp. 1080 - 1087. https://doi.org/10.1007/s00134-013-2841-z
Transitioning to the new economy: Individual, regional and intermediation influences on workforce retraining outcomes
Goldstein, H. A., Lowe, N. and Donegan, M.. (2012). Transitioning to the new economy: Individual, regional and intermediation influences on workforce retraining outcomes. Regional Studies. 46(1), pp. 105 - 118. https://doi.org/10.1080/00343404.2010.486786
Francis Galton, measurement, psychometrics and social progress
Goldstein, Harvey. (2012). Francis Galton, measurement, psychometrics and social progress. Assessment in Education: Principles, Policy & Practice. 19(2), pp. 147 - 158. https://doi.org/10.1080/0969594X.2011.614220
The quality of planning scholarship and doctoral education
Goldstein, Harvey A.. (2012). The quality of planning scholarship and doctoral education. Journal of Planning Education and Research. 32(4), pp. 493 - 496. https://doi.org/10.1177/0739456X12449484
Multilevel Modeling of Social Segregation
Leckie, George, Pillinger, Rebecca, Jones, Kelvyn and Goldstein, Harvey. (2012). Multilevel Modeling of Social Segregation. Journal of Educational and Behavioral Statistics. 37(1), pp. 3 - 30. https://doi.org/10.3102/1076998610394367
The analysis of record-linked data using multiple imputation with data value priors
Goldstein, Harvey, Harron, Katie and Wade, Angie. (2012). The analysis of record-linked data using multiple imputation with data value priors. Statistics in Medicine. 31(28), pp. 3481 - 3493. https://doi.org/10.1002/sim.5508
Measuring success: League tables in the public sector
Foley, Beth and Goldstein, Harvey. (2012). Measuring success: League tables in the public sector London, United Kingdom: The British Academy.
REALCOM-IMPUTE Software for Multilevel Multiple Imputation with Mixed Response Types
Carpenter, James, Goldstein, Harvey and Kenward, Michael. (2011). REALCOM-IMPUTE Software for Multilevel Multiple Imputation with Mixed Response Types. Journal of Statistical Software. 45(5), pp. 1 - 14.
A note on 'The limitations of school league tables to inform school choice'
Leckie, George and Goldstein, Harvey. (2011). A note on 'The limitations of school league tables to inform school choice'. Journal of the Royal Statistical Society Series A: Statistics in Society. 174(3), pp. 833 - 836. https://doi.org/10.1111/j.1467-985X.2010.00688.x
Estimating research performance by using research grant award gradings
Goldstein, Harvey. (2011). Estimating research performance by using research grant award gradings. Journal of the Royal Statistical Society Series A: Statistics in Society. 174(1), pp. 83 - 93. https://doi.org/10.1111/j.1467-985X.2010.00657.x
Understanding uncertainty in school league tables
Leckie, George and Goldstein, Harvey. (2011). Understanding uncertainty in school league tables. Fiscal Studies. 32(2), pp. 207 - 224. https://doi.org/10.1111/j.1475-5890.2011.00133.x
Patchwork intermediation: Challenges and opportunities for regionally coordinated workforce development
Lowe, Nichola, Goldstein, Harvey and Donegan, Mary. (2011). Patchwork intermediation: Challenges and opportunities for regionally coordinated workforce development. Economic Development Quarterly: the journal of American economic revitalization. 25(2), pp. 158 - 171. https://doi.org/10.1177/0891242410383413
Pupil composition and accountability: An analysis in English primary schools
Kounali, Daphne, Robinson, Anthony, Lauder, Hugh and Goldstein, Harvey. (2010). Pupil composition and accountability: An analysis in English primary schools. International Journal of Educational Research. 49(2-3), pp. 49 - 68. https://doi.org/10.1016/j.ijer.2010.08.001
MCMC sampling for a multilevel model with nonindependent residuals within and between cluster units
Browne, William and Goldstein, Harvey. (2010). MCMC sampling for a multilevel model with nonindependent residuals within and between cluster units. Journal of Educational and Behavioral Statistics. 35(4), pp. 453 - 473. https://doi.org/10.3102/1076998609359788
Pupil composition and accountability: An analysis in English primary schools
Lauder, Hugh, Kounali, Daphne, Robinson, Anthony and Goldstein, Harvey. (2010). Pupil composition and accountability: An analysis in English primary schools. International Journal of Educational Research. 45(2-3), pp. 49 - 68. https://doi.org/10.1016/j.ijer.2010.08.001
Statistical modelling of repeated measurement data
Goldstein, Harvey. (2010). Statistical modelling of repeated measurement data. Longitudinal and Life Course Studies. 1(2), pp. 170 - 185. https://doi.org/10.14301/llcs.v1i2.67
Handling attrition and non-response in longitudinal data
Goldstein, Harvey. (2009). Handling attrition and non-response in longitudinal data. Longitudinal and Life Course Studies.
Multilevel multivariate modelling of childhood growth, numbers of growth measurements and adult characteristics
Goldstein, Harvey and Kounali, Daphne. (2009). Multilevel multivariate modelling of childhood growth, numbers of growth measurements and adult characteristics. Royal Statistical Society. Journal. Series A: Statistics in Society.
Multilevel models with multivariate mixed response types
Goldstein, Harvey, Carpenter, James R., Kenward, Michael G. and Levin, Kate A.. (2009). Multilevel models with multivariate mixed response types. Statistical Modelling.
Comment peut-on utiliser les etudes comparatives internationales pour doter les politiques educatives d'information fiables?
Goldstein, Harvey. (2009). Comment peut-on utiliser les etudes comparatives internationales pour doter les politiques educatives d'information fiables? Revue Francaise de Pedagogie.
Comment: Citation Statistics
Goldstein, Harvey and Spiegelhalter, David. (2009). Comment: Citation Statistics. Statistical Science.
The limitations of using school league tables to inform school choice
Leckie, George and Goldstein, Harvey. (2009). The limitations of using school league tables to inform school choice. Royal Statistical Society. Journal. Series A: Statistics in Society.
Evidence and education policy - some reflections and allegations
Goldstein, Harvey. (2008). Evidence and education policy - some reflections and allegations. Cambridge Journal of Education.
Adjusting for measurement error in the value added model: evidence from Portugal
Ferrao, Maria Eugenia and Goldstein, Harvey. (2008). Adjusting for measurement error in the value added model: evidence from Portugal. Quality and Quantity. 43, pp. 951 - 963. https://doi.org/10.1007/s11135-008-9171-1
Review of 'Monitoring Educational Achievement'
Goldstein, Harvey. (2008). Review of 'Monitoring Educational Achievement'. International Journal of Educational Development.
Modelling measurement errors and category misclassifications in multilevel models
Goldstein, Harvey, Kounali, Daphne and Robinson, Anthony. (2008). Modelling measurement errors and category misclassifications in multilevel models. Statistical Modelling.
School league tables: what can they really tell us?
Goldstein, Harvey. (2008). School league tables: what can they really tell us? Significance.
The effects of year repetition (redoublement) on the progress of pupils in the first three years of French schooling
Goldstein, Harvey. (2008). The effects of year repetition (redoublement) on the progress of pupils in the first three years of French schooling.
Techniques for Monitoring the Comparability of Examination Standards
Goldstein, Harvey. (2007). Techniques for Monitoring the Comparability of Examination Standards Qualifications and Curriculum Authority.