Data Resource Profile: Yorkshire Specialist Register of Cancer in Children and Young People (Yorkshire Register)

Data Resource Profile: Yorkshire Specialist Register of Cancer in Children and Young People (Yorkshire Register) Kirsten J Cromie ,* Paul Crump, Nicola F Hughes, Sarah Milner, Diana Greenfield, Anna Jenkins, Richard McNally, Dan Stark, Charles A Stiller, Adam W Glaser and Richard G Feltbower Leeds Institute for Data Analytics, School of Medicine, Clinical and Population Sciences Department, University of Leeds, Leeds, UK, Leeds Institute of Medical Research, School of Medicine, University of Leeds, Leeds, UK, Sheffield Children’s NHS Foundation Trust, Haematology and Oncology Department, Sheffield, UK, Population Health Sciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon, UK and National Disease Registration Service, NHS Digital, England


Data resource basics
Cancer is a rare disease in children and young people 8,9 and is commonly defined such that it includes any malignant tumour or benign central nervous system (CNS) tumour. An average of 1645 cases of childhood cancer (0-14 years) are diagnosed each year in the UK and 2110 per year for 15-to 24-yearolds. 8 Registrations of newly diagnosed cases of cancer in individuals aged 0-29 years account for just 1-2% of all cancer registrations in England. 10 Despite this, cancer places a considerable burden not only upon the individuals themselves but also on their families and the healthcare system. 10,11 Much remains unknown about the aetiology and long-term outcomes of cancer in this young age group. 10 The Yorkshire Specialist Register of Cancer in Children and Young People (Yorkshire Register) 1 was established in 1984 to address this deficit, providing a basis for worldleading epidemiological and outcomes research examining the patterns and causes of cancer in children and young people and undertaking applied health services research generating novel insights into cancer outcomes. The Yorkshire Register specifically aims to collect comprehensive diagnostic, treatment and outcomes information that adds value and is not readily available from other routine National Health Service (NHS) sources, including the National Cancer Registration Analysis Service (NCRAS). 5 Data from individuals held in the Yorkshire Register are linked to a suite of administrative health and non-health-related data sources including secondary care [Hospital Episode Statistics 3 (HES)], death registration records and the National Pupil Database (NPD) 6 using NHS number, date of birth, sex and patient postcode at diagnosis.
The Yorkshire Register is held on an encrypted firewallprotected secure platform within Leeds Institute for Data Analytics at the University of Leeds. 2 Subject to ongoing research funding, the data will be held indefinitely enabling the accrual of an ever-increasing data set relating to cancer in young people and allowing more powerful statistical comparisons to be performed.
The Yorkshire Register is a regional population-based research database of cancer diagnoses. The geographical area covered by the Yorkshire Register is contiguous with the former Yorkshire and the Humber Strategic Health Authority (Yorkshire, UK) ( Figure 1). 2 The region covers a population of 5.5 million, 2 million of whom are aged <30 years. 12 Individuals aged <30 years diagnosed with a primary malignant (or benign CNS) tumour whilst resident in the area are eligible for inclusion on the Yorkshire Register, even if they are diagnosed outside of the Yorkshire area.
The Yorkshire region comprises a diverse mix of urban and rural communities, with a significant ethnic minority population that is predominantly of south Asian origin, comprising 6% of the Yorkshire & the Humber population in the 2011 Census. 13 An estimated 60% of the south Asian population in Yorkshire also originates from Mirpur in rural Pakistan. 14 This makes Yorkshire one of the few regions in the UK that allows detailed analysis of a relatively homogeneous, second-and third-generation south Asian population.
As of 15 February 2022, the Yorkshire Register held data on 11 702 tumours diagnosed in 11 482 individuals, representing 0.6% of the population of under-30-yearolds in England. 12 Eight thousand four hundred and ninety-five (74%) of those individuals were alive at the time of data extraction (Table 1).
Key demographic information on all individuals held in the Yorkshire Register is presented in  (Table 1). 16 Where this information was missing historically, results from Onomap naming algorithms were used. 17 Yorkshire Register patient ethnicity assignment has been described in detail previously. 17 Combining ethnicity data from multiple sources ensures complete ethnicity data are available for all individuals in the Register where possible.
Townsend deprivation index is used as a measure of area-based material deprivation for each individual. 15 The index is based on official statistics on rates of unemployment, non-home ownership, household over-crowding and non-car ownership (Supplementary Box S1, available as Supplementary data at IJE online); 27% of children (0-14 years) and 26% of 15-to 29-year-olds are in the most deprived fifth (V). When compared with 20% for England as a whole, this is indicative of a slightly more deprived population in the Yorkshire childhood and young-adult cancer population.

Data collected
Core data collection A copy of the Data Collection Form can be found on the Yorkshire Register website (https://ysrccyp.org.uk/wp-con tent/uploads/sites/103/2021/07/Protocol-April-2021-Data-Collection-Form.pdf).
The primary source of notifications for paediatric tumours are the two Principal Treatment Centres in Yorkshire [Leeds Teaching Hospitals Trust (LTHT) and Sheffield Children's Hospital NHS Foundation Trust (SCH)]. For 16-to 24-year-olds, the primary source is the teenage and young-adult multidisciplinary teams at LTHT and Sheffield Teaching Hospitals NHS Foundation Trust. For all other ages (25-29 years), the primary source of notifications is NCRAS. An electronic feed has been established from local hospital patient management and pathology systems [including Patient Pathway Manager (PPM) 18 in Leeds, with a view to expand across all NHS Trusts across Yorkshire], as well as NCRAS 5 to improve the efficiency of the Yorkshire Register data collection. The Neuropathology department at LTHT provides an additional secondary source of notifications for all 0-to 29year-olds referred for diagnosis and/or treatment and the Haematological Malignancy Diagnostic Service for all haematological tumours ( Figure 2). Where essential data on cancer diagnosis and treatment are missing or incomplete from the electronic data sources, information is manually abstracted from local hospital notes and patient management systems. Yorkshire Register data are cross-checked annually and validated against other data sets held by the NCRAS 5 for the purposes of quality assurance.
All addresses and postcodes at diagnosis are verified using the Office for National Statistics (ONS) National Statistics Postcode Lookup 19 to ensure geographical eligibility. Each postcode is mapped to a small area census code and assigned to a census enumeration district or Output Area, and then aggregated up into lower super output areas, county districts, counties or Clinical Commissioning Groups within the Yorkshire region, dependent on the geographical level of analysis. This permits the characterization of geographical areas by social class, ethnic group and other variables such as population migration at different scales using census data.

Additional registration details
All diagnoses on the Yorkshire Register are coded according to ICD-O versions 2 and 3 (based on ICD-10/ICD-11) using morphology and site, and grouped according to the International Classification of Childhood Cancer (third edition) (ICCC-3) 20 (Table 2). Copies of diagnostic pathology reports, cytogenetic and molecular genetic diagnostics are retained to provide comprehensive information on diagnosis and facilitate future research should diagnostic classifications change. Notifications of relapse (including date of relapse) and secondary malignant neoplasms (SMNs) are obtained via flagging and linkage with NCRAS. 5 If a patient is diagnosed with a SMN while under the age of 25 years and resident in Yorkshire, then this information is normally also acquired directly from the principle treatment centres (LTHT and SCH) in Yorkshire as part of the core data collection (Table 2). Data received from NCRAS includes date of subsequent tumour diagnosis and diagnostic group, coded to ICD-O-2 or ICD-O-3 (depending on the date of diagnosis). SMNs are defined according to the recommended coding of multiple primary cancers. 21,22 Additional data linkages (enhanced treatment information and death notifications) Enhanced treatment information on chemotherapy and radiotherapy is obtained through linkage with NCRAS, 5 including the national Systemic Anti-Cancer Therapy (SACT) 23 and Radiotherapy data sets, 24 as well as hospital electronic prescribing systems such as ChemoCare. 25 ChemoCare extracts are updated annually and SACT every 2 years for use as part of continual data validation exercises for the Register and to facilitate a programme of research comparing the chemotherapy doses and intensities given to individuals and subsequent effect on outcomes. Figure 2 presents the Yorkshire-wide data flow process describing the linkage of Yorkshire Register data with other registry and healthcare databases.
Follow-up information is derived from electronic feeds from PPM 18 and NCRAS 5 ( Figure 2). The Yorkshire Register is linked to ONS Death Registration Data, 26 considered the gold standard for mortality data in the UK. 27 Information is provided for deaths occurring in England, whether the individuals are dead, embarked or untraceable; death certificates are also sent to us listing cause of death and place of death (Table 3). Currently we estimate that <0.1% of all individuals on the Yorkshire Register have been lost to follow-up based on the aforementioned cross-checking with NCRAS and the Personal Demographics Service (NHS Digital). 28 Data resource use Linking information on outcomes from secondary care (obtained from HES data including APC, outpatient, accident and emergency, and mental health admissions) has facilitated a range of published studies into aetiology, patterns of care and treatment vs outcomes. 22,[30][31][32][33][34][35][36][37] Linked HES APC data have been used to investigate the long-term sequelae of cardiovascular disease, 22,32 respiratory morbidity 30 and SMNs. 20 The Yorkshire Register has been instrumental in supporting a body of research that identified poor treatment outcomes for cancer in teenagers and young adults aged 13-24 years. 38 Findings had major implications in  (Table 1). 16 Where this information was missing historically, results from Onomap naming algorithms were used. 17 c Ethnicity information was not available from electronic health records, linked Hospital Episode Statistics data or Onomap naming algorithms. 17 substantiating new NHS policy in 2005, which led to the introduction of specialized teenager and young-adult cancer services. From 2016 onwards, this has led to gradual improvement in the health outcomes for teenagers and young adults. 37 There are plans to continue the internationally recognized programme of research on childhood and youngadult cancer outcomes using the Yorkshire Register database. Current and future projects include investigations into mental health outcomes, fertility problems and cardiometabolic diseases in the long-term survivor population. We aim to identify how these outcomes vary by demographic factors as well as type of malignancy and treatment received. Through novel data linkage to the National Pupil Database 6 ( Table 2) we also plan to investigate how the educational trajectory of individuals is affected by a cancer diagnosis in the childhood and young-adult years.
A bibliography of all peer-reviewed published studies using Yorkshire Register data can be found in the Appendix.

Strengths and weaknesses
With detailed demographic, clinical and follow-up data on >11 000 individuals stretching back almost 50 years and linkages to multiple NHS and other routine data sets, the Yorkshire Register research database provides an invaluable and unique population-based data resource for researchers, clinicians and commissioners to further understand the causes and outcomes of cancer in young people.
The accuracy, completeness and comprehensiveness of the clinical and socio-demographic data held on children, teenagers and young adults with cancer in Yorkshire is very strong. The Yorkshire Register is the only specialist database of its kind in England that covers all individuals diagnosed with cancer under the age of 30 years. The demographic and ethnic profile of Yorkshire, in conjunction with validated postcode at diagnosis and the ascertainment of complete and accurate ethnicity data from multiple sources, enables us to explore crucial differences in incidence and prognosis for specific ethnic minority groups or individuals from socio-economically deprived areas [36][37][38] where national estimates are unavailable. 41 Whilst the data collected are limited to the Yorkshire region, intelligence generated is of benefit to national and international health service and research partners with whom we increasingly collaborate to support improvements in healthcare and cancer outcomes.
Innovative data linkages with routinely collected administrative data facilitate the Yorkshire Register's world-leading programme of research on childhood and young-adult cancer outcomes, extending into groundbreaking areas of social and psychosocial morbidity. Internationally, the data linkages employed by the Yorkshire Register have been recognized as exemplary methods of evaluating patient outcomes using routine health data sets. 40 There are some limitations: for example, given the rarity of certain diagnostic groups, limited analyses are possible using data held on this regional resource due to small numbers. This can hinder the extent to which detailed subgroup analyses can be undertaken. Data are not available for older teenagers and young adults aged 15-29 years diagnosed before 1990 nor linked HES APC data prior to 1996.

Data resource access
Upon receipt of a Data Access Request Form, requests are considered by the Registry Director and Medical Director, the Chair of the Yorkshire Register Scientific Advisory Group, 41 the Registry Data Manager and an independent representative from the LIDA Data Analytics Team, together forming the 'Registry Data Release Panel'. Research proposals will be circulated to all Scientific Advisory Group members (including external expertise) and feedback collated. The decision to approve access will be the responsibility of the Registry Data Release Panel, which need to be unanimously in favour of the application to allow data to be released. SMNs are defined as a malignant neoplasm of any site with a different morphology from that of the primary tumour regardless of time since diagnosis according to the recommended coding of multiple primary cancers. 21,22 ICCC-3, International Classification of Childhood Cancer (third edition) 20 ; CNS, central nervous system.
Before identifiable data are released to third parties, a signed Data Sharing Agreement is required following confirmation of the relevant permissions in all situations other than release of data to consultants or GPs relating to their own treated individuals (and upon receipt of a signed letter of request). In some cases, release of identifiable data will require completion of an application form and proof of approval from appropriate Research Ethics Committees and confirmation of Section 251 support from the NHS Health Research Authority or proof of informed consent.
A copy of the Data Access Request Form and more information can be found on the website (https://ysrccyp. org.uk/research/data-requests/).

Ethics approval
The research work of the Register is undertaken with full ethical approval. Approval was originally obtained from the Northern and

Data availability
See 'Data resource access', above. The data are not publicly available due to privacy or ethical restrictions. The data that support the findings of this study are available on request from the corresponding author (subject to review, with the appropriate ethical and information governance approvals).

Supplementary data
Supplementary data are available at IJE online.

Author contributions
K.C. extracted the data, performed statistical analysis and drafted the manuscript with support from P.C. P.C. contributed to data collection and validation. R.F. is the research programme lead for the Yorkshire Register. A.G. is the Medical Director of the Yorkshire Register. A.G., S.M., N.H., D.S., A.J. and D.G. provide invaluable clinical input that helps shape the work of the Register. R.F., A.G., K.C., P.C., N.H. and S.M. are responsible for facilitating linkages between the Yorkshire Register and various health-and non-healthrelated data sets. C.S. and R.M. are long-standing members of the Yorkshire Register Scientific Advisory Group and provide  16 independent epidemiological advice and guidance that shape the activities of the Register. All authors provided critical feedback on the manuscript.

Funding
The Yorkshire Register is supported by competitively obtained grants from the Leeds Candlelighters' Trust since 1984 (grant number RG.EPID.100811) and the Laura Crane Youth Cancer Trust between 2017 and 2020 (grant number RG.LIGH.126299). K.C. is funded by the Emma and Leslie Reid Research Scholarship (University of Leeds). The funders were not involved in any aspect of the study design, the collection, analysis and interpretation of data or in writing the manuscript.