What’s in a number? Small numbers are roguish capsules of triple meaning. The number 7 is the date you were married; it’s the exact number of teaspoons of sugar in a halwa recipe your mom sent you; it’s the age of your eldest son on his next birthday.
As we get older, we become acquainted with the texture of these numbers; we begin to treat them as harbingers, or maybe ports in a storm. We associate them with feeling. I don’t like the number 21. Why? Because my favourite batsman gets out most often facing 21 deliveries. 3 is my lucky number. Why? I don’t know, it just is.
Large numbers are never quite as potent. What’s in the number 23, 723? Or 59,338? Nothing, as far as you know. Most of them lose character beyond the hundredths decimal place. Give me a man who murmurs “That train has 138 bogies,” and I’ll give you ten who’d say: “100 bogies is a lot for a train.” (It’s a goods train.)
By the end of this post, you’ll come to know a large number. Not just know it, but know it twice, the first time as too large, the second time as too small.
That number is 143,000.
The nature of mutations
You’ve got DNA; every cell in your body harbours two copies of 3 billion bases—there’s a characterless large number again; it is, by a strange coincidence, also the number of years since life on Earth began: a base for every year—3 billion bases that, once translated into the right language, make you you. In a very real sense, DNA are the building blocks of the biological world; they’re what makes life happen. DNA is transcribed into RNA and thence to proteins, the agents of function in the human body. RNA and proteins are all-important, but they’re also just… puppets. It’s DNA that pulls the strings.
You can’t just meddle with DNA. It’s a sunny day, and you’re out watching trains, or perhaps your favourite batsman endure past the twenty-first delivery. You flake some skin, some other skin turns brownish-pink, and within the stern motor of your cutaneous membrane a cell acquires a mutation. What is it? A DNA base has just been altered, a T to a C.
Oh no. Oh wow. Is this how I get cancer? Is this how I get skin cancer? Almost certainly not. Most mutations only knock—and knock weakly—on the door to chimeras. Part of the reason is ploidy: you always have two copies of DNA, so if one goes awry there’s always another one to bank on (though see next paragraph for a counterpoint). Besides, DNA is—what’s the geeky word?—redundant. You change a base, and about a quarter of the time it doesn’t matter, because the new base codes for the same protein. Zoom out for a second from that sun-induced T->C mutation. What do you see?
The neighbouring bases are C and T. So what? So everything. The triplet of bases forms a codon. Codons “code” for amino acids, which line up in chains and contort to form proteins. This particular codon, CTT, codes for leucine. Here’s the fun part. Despite the T turning to a C, the new codon, CTC, also codes for leucine. How is this possible? In a word: evolution. In 21 words: there are more combinations of codons (64) than amino acids (20), so many codons can stand for the same amino acid.
So DNA is tamper-resistant. But—as new parents might grimly testify—resistant and proof are not the same thing. Ploidy doesn’t protect against autosomal dominant mutations; mutations whose presence in a single DNA copy is sufficient to set off a condition. Besides, that codon redundancy thing? It doesn’t work if the T->C mutation is in the centre of the codon.
In this case, the mutation turns the leucine into a proline. Does that make a difference? Well, it depends. There’s a delicate chain of events that begins, but only begins, with the DNA. DNA pulls the strings; but what about the puppets? Basically,
DNA —-> RNA —-> protein —-> does health stuff!
where the arrow is a poor man’s causal waterfall. (Warning: this is a caricature. The real relationship between DNA, RNA and proteins is very, very complex. Volumes of wisdom have been written about it. Much is known. Still more is less known.) Changing DNA changes RNA, in turn changing protein. Protein “regulates” health, ergo changing DNA… changes health.
Or sort of, anyway. Just as redundancy is built into the fabric of DNA, it is also built into the relationship between DNA and “downstream” processes; between puppet and master. Sever a single skein of string, and usually, usually, nothing happens. The puppets just keep on skipping. How? Well, for one thing, the change in amino acid, itself caused by the change in DNA, may not actually, physically change the protein. Or else the protein, despite the change, is able to perform its original function. And so on. This jugaad, this genetic bargaining, is a constant, ongoing process; it’s what separates human beings and other complex organisms from, say, certain viruses, where a single error is sufficient to spawn an instant and usually failed subspecies.
The average human has six million mutations. Some of these are passed on through his “germline”, others acquired, or “somatic”. On average, somewhere between 5,999,979 and all of the 6,000,000 mutations are completely harmless. I made that up, of course: neither is 21 the exact upper bound on the number of harmful mutations, nor is 6 million the exact number of mutations we all carry. But these numbers, one small, the other fairly large, are indicative. If it takes a lot to change DNA, there is also lot of DNA vulnerable to change. The two extremes cancel each other out: a mutation on a single base means the difference between a day out on the green, and cancer.
The nature of breast cancer: which mutation matters?
Think, for instance, of breast cancer. (Better yet: don’t.) Breast cancer is a prominent example of the harm caused by changing a single DNA base. The BRCA1 and 2 genes, burdened with the name of the cancer they suppress, are both vulnerable to mutations; a female with a mutation on either of these genes is several times more likely to develop breast and/or ovarian cancer in her lifetime than one who doesn’t. And what’s the story? What’s the mutation? Here’s one, an A->C, on the BRCA2 gene, detected by the Strand NGS genome browser.
Sure it’s a mutation; but you’re smarter now. You ask: where’s the mutation inside the codon? Just looking at the picture, it’s impossible to say: the codon could start at the mutated base, a base on down or up; or two. How can you tell? Just look; it’s easy.
The protein HGVS column, the third one in the bottom spreadsheet, tells you what you need to know. This particular A -> C mutation turns a Gln to an Arg, i.e., the amino acid glutamine to arginine.
All this may still be Greek to you; you might ask, reasonably, if this mutation matters; if cutting this string makes a heap of the puppets; if the mutation results in disease. Look again. The toothpaste blue dbSNP rectangle, right under the genome browser, tells you that this mutation is – drum roll….known. It has been seen before.
Hovering over the blue rectangle in the picture above opens a text-box that reveals prior work on this mutation (below). We see the all-important rs identification of the mutation; we see that it’s a missense mutation, which means it turns one amino acid into another; we see that some thirty different research consortia, including 1000 genomes and OMIM, have reported this mutation; and that this mutation is clinically relevant. Looking up the refSNP id reveals that this mutation is insignificant in and of itself, but is one of a handful, in this case, exactly 25, that are associated with an increased risk of breast cancer.
So much for prior work; there’s a lot to be said for shoulder standing. But findings, like this one, can often be inconclusive; they can often leave you queasy. What does ‘increased risk’ even mean? The good news is that there are other ways. You can appeal to publicly available predictive tools, like SIFT and Polyphen; there are databases, like dbNSFP. These attempt to use mathematical models to figure out the probability that a certain mutation is deleterious. In this case, attempting to “Find Significant SNPs” in Strand NGS reveals, after some routine clicking through, that the mutation in question isn’t harmful, but that a whole host of others may well be.
Breast cancer: or, why 143,000 is both too much and not enough
Breast cancer thrives on mutations like these, single base swaps that wreak a good deal of havoc on—usually—a woman. Many, like the one above, are merely unsettling. Others are definitive. In the case shown above, the female in question was shown to have a mutation on BRCA1 that cut off a protein mid-sequence, essentially guaranteeing the eventual manifestation of the disease in its owner.
What is 143,000? It’s the number of breast cancer cases in India in the year 2012. Given how much we now know about this disease, the number feels large. Consider, for instance, the dbSNP text-box above. It reveals, at the minimum, some thousands of man-years of effort on identifying that mutation alone. When you think of the 3 billion or so bases that exist in duplicate in each of our cells, and when you consider that mutations can occur in any one of those bases, this feels like the monumental effort it clearly is. Some 200,000 research papers have been written on the BRCA genes; Clinvar identifies 2,345 mutations in BRCA1 and 2,653 mutations in BRCA2 as pathogenic. Despite this work, we’re no closer to preventing breast cancer. Last month, Stanford professor and Fields Medalist Maryam Mirzakhani, with access presumably to some of the best healthcare in the world, lost her four-year battle with breast cancer. All of which is to say: there’s a long way to go.
In India, however, the largely generic problem of “too many cancers” bumps up against the specifically subcontinental problem of “too many people.” 143,000 is clearly—clearly—too few. The billion three hundred million in this country only contribute to 143,000 detections a year; meanwhile, a country with a fourth the number of people detects nearly twice as many. How many people does that leave out? Simple math reveals the cruelty of linear proportion; there are probably at least a few hundred thousand, and as many as a million, silent sufferers of breast cancer in India. It’s the large, characterless numbers that are also tyrannical.
The good news is nascent, probably a drop in the ocean; but hopers gonna hope. Small, focused outfits like Strand Life Sciences now offer clinical genetic tests that can tell you if you’re prone to breast cancer; whether you acquired or inherited particular mutations that are indicative of the disease; and then tell you what you can do about it. Underlying this complex clinical workflow is Strand NGS, which allows you to follow the delicate thread of bioinformatical inference and deduction needed to go from the “proband”—the patient—to diagnosis. For India, clinical genetic workflows like Strand NGS, the ones Strand offers, are powerful tools to battle a disease that, for too long, has thrived unopposed.