NELSON dataset

The dataset of the NELSON study is a Dutch-Belgian population-based, randomised controlled lung cancer screening trial.
The Dutch-Belgian Randomized Lung Cancer Screening Trial (NELSON study) was designed to investigate whether screening for lung cancer by low-dose multidetector computed tomography (CT) in high-risk subjects will lead to a decrease in ten year lung cancer mortality of at least 25% compared with a control group without screening.

The NELSON dataset aims to:

Primary objective of the NELSON trial is to investigate whether screening for lung cancer by 16-detector multi-slice CT with 16 mm × 0.75 mm collimation and 15 mm table feed per rotation (pitch = 1.5) in year 1, 2 and 4 will lead to a decrease in lung cancer mortality in high risk subjects of at least 25% compared to a control group which receives no screening. 

  • Participants were randomized 1:1 in a screening and control group. Subjects who were selected for screening were periodically invited to undergo low dose computed tomography (CT) scans from the chest. Both groups (the screen participants and the controls) filled out questionnaires at the start of the trial and at the follow up after 10 years.

    More specifically, the population comprised of people born between 1928 and 1956 originating from the Netherlands and Belgium, who:
    •    Smoked > 15 cigarettes/day during > 25 years,
    •    Or smoked > 10 cigarettes/day during > 30 years,
    •    and were current smokers or former smokers who quit smoking ≤ 10 years ago. 

    Excluded were those with a moderate or bad self-reported health who were unable to climb two flights of stairs, those with a body weight ≥ 140 kg, with current or past renal cancer, melanoma or breast cancer, or with lung cancer diagnosed less than 5 years ago, or 5 years or more ago but still under treatment. Also those who had a chest CT examination less than one year before they filled in the first NELSON questionnaire were excluded.

    Participants from the screening group were invited for a diagnostic test at years 1, 2, 4, and 6 [2], as shown in the figure. No scans were made in the control group as part of the trial.


    Screen test results that were indeterminate [a] resulted in a short-term follow-up CT screening 3-4 months later (not shown in the figure).

    [a]: Oval-shaped growths in the lungs are called lung nodules. Depending on the nodule’s shape, size, and texture this abnormal growth of tissue can be potentially cancerous (malignant). To determine the outcome of the test, the diameter d, the volume V, and the doubling time 𝜏 of the nodules are compared to criteria outlined in Table 1 in Ref. [3]. 

  • The NELSON dataset consists of digital and non-digital data. The digital data is composed of (1) an imaging set made up of computed tomography (CT) chest scans of the screening group and (2) phenotype data in alphanumeric form that specifies the participant’s characteristics, test results, and CT scan annotations. Lung function and blood/tissue is present for a subset of the participants. Blood is stored in EDTA and paxgene tubes.

  • Apart from the collection of CT scans, the dataset consists of phenotype information of the subjects in alphanumeric form. Most notably, the set contains the test result from each screening, questionnaires filled by the participants, and annotations of detected nodules, as outlined below.

    Lung nodule annotations

    CT scans are supplemented by lung nodule annotation data. The lung nodule annotation was either i) generated with the help of LungCare Software, or ii) manually measured in case of inappropriate segmentation by the software [1]. The following nodule information was recorded in the database, for solid nodules without benign calcification pattern:


    Nodules that did not show a benign pattern of calcification (so-called non-calcified solid nodules) were further characterised [2].

    Table 2: Shaded cells are annotated by the radiologist, while non-shaded cells are calculated by software unless manually adjusted by a radiologist (to correct faulty nodule segmentation).


    For the definitions of the categories, and the precise details of how these quantities were calculated, the reader is referred to Ref. [2].

Harry Groen Pulmonologist

