Rare diseases: or, what happens when backups fail

Cat keyboard
A word with your data backup. ;;tewpitqg[[[nmcappoegapdgud. No? Thought so.
Ploidy is a funny old Greek word for a modern, almost modish concept: data backups. Ploidy is the number of copies of an organism’s DNA. Certain kinds of algae, for instance, don’t buy into backups; they’re haploid, and contain only a single copy of DNA. Plants, on the other hand, are polyploidal; paranoiacs of the genetic realm, they contain dozens, and sometimes hundreds of copies of DNA. Us, we’re in the middle: human beings are diploid, with each of our cells containing two copies of DNA, one from mom, the other from dad.

Two copies of anything is reassuring. This article is diploid: one copy glares at me from my screen, and the other scooches in a server somewhere on the cloud. And so when my cat, also diploid, walks across my keyboard, deleting most of my article and transmuting the rest into gibberish, my annoyance at her is mostly for show. Diploidy, it saves pet cats: who knew?

The similarity between cloud backups and diploidy in humans cuts deeper than jokes about cats, however. Much like data, all DNA is a communiqué, a myth that passes on from generation to generation. Mom and dad come together to write an epic. Mom heard a version of this epic from her parents; Dad heard a slightly different version from his. You—the stuff of you—is both epics, side by side. The 3 billion year hope is that these subtly different epics, coming together, help you thrive in your time, allowing you to craft bear-cleaving augers in the Iron Age or write a fun blog post in the Age of the Cloud.

Genetic diversity is a happy offshoot of diploidy; if DNA is an epic, its mishearings can be fruitful. Maybe dimpled chins run in your family. Your mom has one, and so does her mom before her. Say dimpled chins arise from a genetic mutation named D; the absence of this mutation is N, say. Then your mother’s genes in the bits of her DNA responsible for chins, dimpled or otherwise, might look like this:

Figure T: Your mom gets her movie-star chin from your grandma.

Figure T notates the two copies of DNA your mom carries with a pair of letters separated by a pipe. The letter to the left of the pipe is the gene from her mom, and the one to the right is from her dad. And so, in the case of dimpled chins, a single mishearing—a single mutated copy—is all your mom needs to be the perpetual envy of her generation, because—you got it—dimpled chins are exotic in this part of the world.

Traits like dimpled chins manifest when only a single copy, out of two, are mutated; such traits are dominant traits. Dominant traits are passed on easily. Carrying the example further, say your father’s line doesn’t contain the D gene at all: he’s N|N, nought for nought in the chin department. What’s the probability you’ll end up looking like a matinée idol? Here’s a Punnett square for reference:

Figure P: A Punnett Square. You have even odds of a sultry face.

Figure P depicts a typical approach to inferring the probability of inheriting a trait: list all possible combinations of genes, and find the number of times the outcome favouring the trait occurs. In this case, the dimpled chin outcome D|N occurs 2 out of 4 times, implying a 50% chance that your chin, like your mother’s and grandmother’s, will be dimpled.

Not all traits are dominant; those that require both copies to be mutated are called recessive genes. Recessive traits are rare. Masked by their dominant counterparts, they show up once in a few generations, and can be unwelcome guests. Albinism, a complete absence of melanin in the skin, is a recessive trait. In tropical, melanin-rich India, albinism isn’t just a disability, accompanied as it is with a host of vision defects and an increased risk of skin cancer: it also mobilises a certain deep-seated mistrust of incongruous skin colour.

And so, even as diploidy in human beings guarantees diversity, it brings about the possibility of disfiguration. Just like traits, most genetic diseases can be described as either dominant or recessive; and just like traits, genetic diseases, corrupt hearings of the DNA epic, are passed on from generation to generation. Huntington’s disease, a progressive wasting away of brain cells, incurable and invariably mortal, is dominant, requiring only one parent to carry the defective copy. Sickle cell anaemia, another incurable genetic disease, indigenous to Africa and the Indian subcontinent, characterised by an all-pervading pain that persists throughout the dramatically reduced lifespan the disease ensures in its sufferers, is recessive, and only manifests if both parents passed on their defective copy to their child.

Does a person carry one or two copies of a defective gene? Distinguishing a heterozygous mutation, i.e., one that occurs only on a single copy, from a homozygous mutation, that occurs on both copies, is a critical component of Strand’s clinical workflow.

The workflow begins with the patient, who donates a sample of his tissue or blood for analysis. Reads sequenced from his genome are subject to alignment, in which they’re mapped to the genome of a healthy individual for reference. Figure S, for instance, depicts an A->G heterozygous substitution in StrandNGS.

Figure S: An A to G substitution. 10 out of 16 reads (62%) support the G substitution, and belong to one of the two copies of DNA. The rest support the reference, and belong to the other, non-mutated copy. This is a heterozygous location.

The healthy reference base A is swapped for a G on one of the copies, while the other copy retains the reference. Each read at the mutated location reflects either the G mutation or the healthy A. If we assume, reasonably, that about half the reads came from the mother’s copy, and the other half from the father’s, then heterozygous mutations like this one will have a read support of approximately fifty percent.

Homozygous mutations require greater evidence. If both copies of DNA are mutated at a given location, then, roughly speaking, anywhere between 85 and 100% of sequenced reads at that location can support the mutation. Figure H shows a homozygous G->A substitution; both mother and father carry the mutated copy of the gene. Notice that most, but not all reads, support the substitution.

Figure H: A G->A substitution. All but one read support the substitution, for a read support of 98%. The location is homozygous.

The Strand clinical workflow has a panel for the detection of rare genetic diseases.  Genetic diseases can be dominant or recessive. For dominant diseases, it doesn’t much matter if the mutation is heterozygous or homozygous: both imply similar prognoses and indicate similar therapies. But recessive diseases are different.

Take Wilson’s disease, a genetic disorder resulting from copper accumulation in the liver and lungs. Wilson’s presents as a baffling agglomeration of vague symptoms: vomiting, fatigue, high blood pressure, clumsiness, depression, tremors. Diagnosis is protracted and concludes with an invasive liver biopsy; treatment involves the simple expedient of avoiding copper-rich foods, like oysters and chocolate (though avoiding chocolate is never easy).

Wilson’s disease is rare, occurring once every 30,000 cases or so; and rarely for a genetic disease, can be diagnosed in the lab and cured in the house. The problem lies in the nature of the disease. Because Wilson’s disease is genetic, and because its symptoms can manifest anytime between the ages of 5 and 35, a positive diagnosis implies a familial link: if you have it, then so can your children. Also, because Wilson’s is recessive, it requires the successful detection of a homozygous mutation, i.e., a defect on both backup copies of the DNA. A heterozygous mutation would mean, on the other hand, that while the patient is a carrier, he will never suffer from the disease himself.

Is a mutation homozygous or heterozygous? The call isn’t always easy to make. Here, for instance, are reads at a location with a single inserted base. Out of 16 reads, 13 contain the insertion, with the balance supporting the reference.

Figure I: Reads at a location containing a potential insertion. Purple denotes an insertion, while gray denotes a reference base. 13/16 reads, or 81%, support the insertion.

That’s a read support of 13/16 = 81% for the insertion. You might say it’s homozygous, because 81% is close to 100%; but it isn’t all that far from 50% either. It might be heterozygous, too; the sequencer could just have sampled more reads from one copy of the chromosome than the other. Is there a way to know for sure?

Notice that some of the reads that support the reference “look” wrong; they seem to support far too many mismatches along their length. Here’s two of the supposed reference-supporting reads, with mismatching bases circled:

Figure M: Reads with many mismatching bases.

The first read contains a a mismatching A and T, while the second read contains a single mismatching T at the end. We now have three events: the original insertion, the mismatching A, and the mismatching T. The odds that all three events occur simultaneously is small. Intuition tells us that the insertion should win out, because it has the bulk of the support. But can we prove it?

The answer lies in realignment. Basically, the mismatching reads in Figure M can be aligned to the insertion: doing so makes the errors vanish.

Figure R: Mismatching reads in Figure M, realigned against the insertion.

Both reads in Figure M have switched loyalties; once supporting the reference, they now support the insertion.

Figure H: The insertion has 100% read support and is homozygous.

Before realignment, the insertion had 81% read support and stood in the shadow zone between heterozygous and homozygous; after realignment, the insertion has a hundred percent support and is unequivocally homozygous. Nature’s backup has failed; both copies of the chromosome contain the mutation. If this was the mutation corresponding to Wilson’s disease, or sickle-cell, or Huntington’s, a discovery like this would amount to a diagnosis.

The difference between heterozygous and homozygous, between dominant and recessive, between a ploidy of one and that of two, isn’t just cavilling, isn’t mere arithmetic. Wilson’s disease is a happy example of a curable recessive disease; there are other, less curable ones, like cystic fibrosis, where not just a person but an entire lineage is under threat. In cases like these, the line between one and two is also the line between suspicion and certainty. Among its many other functions, the Strand clinical workflow helps sharpen that line.