Creating a big database: how to link data from different sources

One of the selected DASH projects that received dedicated support in 2019, is the Database project of the Center of Rehabilitation (Beatrixoord). The Center of Rehabilitation is creating a database, that can be accessed by researchers to use for studies, with the goal of improving the rehabilitation process.
Creating a big database: how to link data from different sources

In the rehabilitation process, patients perform tests, fill out questionnaires and follow a program consisting of physical and psychosocial interventions. All of these activities generate data, which is stored in different formats and on different locations. “Because the data we collect is stored on different locations, it is difficult to obtain all the relevant information to answer a research question”, says Diana van Dijken, one of the people leading the project. “The aim of our project is to link the data of the different sources and to save it in one database, so that it can be used for (big data) research.” 

The challenge 

The disparity of the data is a big challenge for researchers and their analyses, as the multiple sources have to be accessed separately and the data have seemingly incompatible formats. On top of that, patient data is sensitive data and therefore it is of great importance that data cannot be traced back to a patient. Therefore, the different data of one patient should be linked in a way where the patient cannot be identified and all these sets of patient data should be stored in an easily accessible and safe database. 

The solution 

“DASH assisted us in extracting data from different sources and investigated which IT and data management functionalities are needed to create this big data structure. With the structure, we will be able to generate datasets on our patient data that can be safely used by our researchers”, Diana van Dijken explains. Funded by DASH and in cooperation with IM-Onderzoek, these IT and data management functionalities are being implemented. This includes a data warehouse that can safely unlock different data sources by using a pseudonymization service, integrate data on a patient level and provide researchers with the data in a controlled manner in a Virtual Research Workspace environment.  

DASH supported the Center of Rehabilitation project to get started with the infrastructure for the database. “Solving our problem would not only be beneficial for us, but the solution can also be used by other departments of the UMCG to create their big databases”, according to Diana van Dijken. The project of the Center of Rehabilitation is still ongoing and the team is working hard on realizing the database. 

IT tools and services 

The IT solutions that are used in this project, are available for researchers within the UMCG as well. 

•Pseudonymization service: Replacing all identifying characteristics with a pseudonym which does not allow the person to be directly identified. 

•Virtual Research Workspace: An environment for secure collaboration amongst different researchers (working within and outside the UMCG) for analyzing data simultaneously and/or sharing privacy-sensitive data. 

More information about DASH