The Need for Realignment to dbSNP

We’ve been using Avadis NGS to analyze a number of clinical samples and ran into an interesting case for InDel detection that could lead to false interpretation unless handled properly. The case at hand involves a child who suffered from Pulmonary Hypertension, Pulmonary Infections, and a few other abnormalities, all at birth or in the first year of life, and subsequently died in the second year. We had DNA from this child as well as from his (consanguinious) parents. All three were subjected to Exome sequencing.

We used Avadis NGS to align the data and then performed variant calling. Next, we used the Find Significant Variants functionality in Avadis NGS to identify variants which were heterozygous in the parents but homozygous in the child, and then restricted these variants to those with allele frequencies below 1% using the Find Damaging Variants operation. This yielded a hundred or so gene candidates. We then picked out genes known to cause rare childhood diseases.

One such gene was ALMS1, which is involved in Alstrom Syndrome, a disease that is known to be associated with the occurrence of recurrent pulmonary infections. The mutation in ALMS1 was a CTC insertion, heterozygous in both parents and homozygous in the child. This mutation causes the insertion of an extra amino acid P (=CCT). It also appeared to be a novel mutation not reported in dbSNP, warranting further investigation.

By habit, we also look at dbSNP variants in the vicinity of our mutation of interest just so we can explain the presence of common polymorphisms, if any, close to our potentially deleterious mutation. In this case, there was indeed a dbSNP insertion close-by, as shown in the picture below.

This dbSNP variant had the exact same insertion sequence, i.e., CTC, but appeared 3 bases away from our mutation. And it so happened that the 3 intervening characters were CTC as well! In other words, our mutation and the dbSNP variant both were different ways of just calling the same variant; CTC was repeated twice and this could be called as an insertion to the left (our mutation) or an insertion to the right (dbSNP). Further, it so happens that the dbSNP variant is a very common one. Therefore our mutation, far from being novel, was quite common and not a candidate for the case at hand!!

This tells us that in addition to local realignment of an InDel across multiple overlapping reads, we also need to realign to dbSNP to identify the true allele frequency of our mutation, an alteration that we will definitely seek to add to Avadis NGS in the near future.

About Ramesh Hariharan

Dr. Ramesh Hariharan is an academic entrepreneur responsible for the software-based technology development and implementation at Strand Life Sciences. He is also the chief architect for all of Strand's products, including the award winning Avadis® platform. He is a recipient of the TR100 Award (2002) of Young Innovators by MIT's Technology Review Magazine and in 2003 received the Global Indus Technovator Award from MIT, instituted to recognize the top 20 Indian technology innovators worldwide. Ramesh is an IIT - Delhi Computer Science alumnus, has a Ph.D. in Computer Science from the Courant Institute of Mathematical Science, New York University and a postdoctoral degree in Computer Science from the Max Planck Institute, Saarbrücken, Germany. His research interests are in sequence analysis, string algorithms, computational biology, computational geometry and foundations of computing.