Octopus combines sequencing reads and prior information to phase-called genotypes of arbitrary ploidy, including those with somatic mutations. We show that Octopus accurately calls germline variants in individuals, including single nucleotide variants, indels and small complex replacements such as microinversions. Using a synthetic tumor data set derived from clean sequencing data from a sample with known germline haplotypes and observed mutations in a large cohort of tumor samples, we show that Octopus is more sensitive to low-frequency somatic variation, yet calls considerably fewer false positives than other methods. Octopus also outputs realigned evidence BAM files to aid validation and interpretation.
Cancer is a disease of the genome - genetic mutations disrupt the normal functioning of cells leading to uncontrolled proliferation. For optimal treatment it can be helpful to know precisely which mutations are present in tumour tissue. This is however a technically challenging problem, as tumours are heterogeneous and their genomes may be very complex.
Our research describes a new algorithm and statistical method, called "Octopus", designed to accurately identify genetic variants from short-read sequence data in a variety of contexts, including for cancer-normal studies. Previous algorithms were optimized for a single context, usually whole-genome sequencing of a single diploid sample. By contrast our method can easily be adapted to a range of contexts, such as tumour-normal sequencing, and the identification of de novo mutations in offspring using trio data.
Read our recent paper: A unified haplotype-based method for accurate and comprehensive variant calling", Nature Biotech 21