You can set your preferences for social media and targeted advertising cookies here. We always place functional cookies and analytical cookies. Functional cookies are necessary for the site to work properly. With analytical cookies we collect anonymous data about the use of our site. With that information, the site can be further improved so that it is easier for you to find what you are looking for.
Healthcare data is growing faster than ever—but much of its value remains unused. We combine clinical codes and free-text data to generate actionable population insights, using natural language processing, predictive modelling, and our FAIR-compliant CRISP-PIP framework to guide the entire process.
We develop data-driven methods to combine structured and unstructured healthcare data for innovation in primary care research and beyond. Structured data, such as International Classification of Primary Care codes and medication records, provide standardised clinical information, while unstructured text (e.g. physicians notes or survey fields) contains rich contextual detail not captured elsewhere. We use natural language processing (NLP) to process free-text data and extract clinical topics, symptoms, and sentiments. In parallel, structured data is used to define disease patterns and care episodes. Together, these data sources enable robust prediction modelling and the development of Population Insight Products (PIPs). We have already applied NLP to over 3 million general practice contacts, extracting key clinical topics. We now evaluate the added predictive value of unstructured data over structured data alone, and explore how the combination improves insight, prediction, and personalisation of care.
Our work is guided by the CRISP-PIP framework, a stakeholder-driven, FAIR-compliant data science workflow supporting the full trajectory from defining a research question to building and implementing data-driven products in real-world settings. This framework was developed in co-creation with Datapoort.
Relevance
How our research benefits to society
What we want to achieve
We aim to make full use of routinely collected healthcare data, both structured and unstructured, to create meaningful insights that support personalised care, prevention, and timely intervention. Our approach enables more complete and accurate analysis than using either data type alone.
What our impact on patient care is
By integrating structured clinical data with clinical general practitioner notes for example, we identify patient needs earlier and more precisely. This may support better triage, risk assessment, and patient follow-up in general practice, contributing to more efficient and targeted care delivery.
How we translate our findings into care and policy
With the CRISP-PIP framework, we provide a reproducable and transparent method for translating complex data analyses into population insight products that inform care improvement, health system planning, and data-driven policy.