Research building

Data-driven Innovations in Primary Care

From Data Mining to Population Insights Products (PIP) Research
From Data Mining to Population Insights Products (PIP)
Healthcare data is growing faster than ever—but much of its value remains unused. We combine clinical codes and free-text data to generate actionable population insights, using natural language processing, predictive modelling, and our FAIR-compliant CRISP-PIP framework to guide the entire process.

We develop data-driven methods to combine structured and unstructured healthcare data for innovation in primary care research and beyond. Structured data, such as International Classification of Primary Care codes and medication records, provide standardised clinical information, while unstructured text (e.g. physicians notes or survey fields) contains rich contextual detail not captured elsewhere.
We use natural language processing (NLP) to process free-text data and extract clinical topics, symptoms, and sentiments. In parallel, structured data is used to define disease patterns and care episodes. Together, these data sources enable robust prediction modelling and the development of Population Insight Products (PIPs).
We have already applied NLP to over 3 million general practice contacts, extracting key clinical topics. We now evaluate the added predictive value of unstructured data over structured data alone, and explore how the combination improves insight, prediction, and personalisation of care.

Our work is guided by the CRISP-PIP framework, a stakeholder-driven, FAIR-compliant data science workflow supporting the full trajectory from defining a research question to building and implementing data-driven products in real-world settings. This framework was developed in co-creation with Datapoort.

Relevance

How our research benefits to society

What we want to achieve

We aim to make full use of routinely collected healthcare data, both structured and unstructured, to create meaningful insights that support personalised care, prevention, and timely intervention. Our approach enables more complete and accurate analysis than using either data type alone.

What our impact on patient care is

By integrating structured clinical data with clinical general practitioner notes for example, we identify patient needs earlier and more precisely. This may support better triage, risk assessment, and patient follow-up in general practice, contributing to more efficient and targeted care delivery.

How we translate our findings into care and policy

With the CRISP-PIP framework, we provide a reproducable and transparent method for translating complex data analyses into population insight products that inform care improvement, health system planning, and data-driven policy.

Part of

Related research

Contact

Wikje Berends-Hoekstra PhD Candidate, Data Scientist

FemHealthData
Eerstelijnsgeneeskunde en Langdurige Zorg UMCG
Huispostcode FA21
Postbus 196
9700 AD Groningen