The clinical workflow, retinoblastoma and split alignment


An infant with leukocoria, a common retinoblastoma symptom visible in photographs taken with a camera flash. The normal right eye reflects red, as it should; the left eye, infected, reflects white.


Retinoblastoma is an “oma” named for its surface effect, its “presentation”:  it’s a cancer that attacks the eye. Worse, because 80% of retinoblastoma diagnoses occur before the age of 3, it’s a cancer that attacks the eyes—almost exclusively—of little children. Retinoblastoma is a “good cancer”: 90% of all RB cases survive into adulthood. But the fine print is terrifying: extensive radio or chemotherapy; a lost or severely circumscribed childhood; the possibility of losing vision in one or, in bilateral cases, both eyes.

India’s retinoblastoma numbers aren’t good. Year round, India diagnoses 20% of all RB cases in the world. Of these, between 10 and 30% go on to lead functioning lives as adults; the rest are disabled for life or die. Death from RB, let alone blindness, ought not to be a reality. But India’s masses, a majority of them rural, are mostly unaware of the disease; or, despite awareness, are usually diagnosed in the later, more devastating stages; or, despite early diagnosis, can’t afford treatment. Lack of knowledge, lack of means, lack of access: the trifecta is poisonous, and in a relentless condition like cancer, unsurprisingly lethal.

A genetic disease, retinoblastoma is caused by a change, or a mutation, to the genetic code. Genetic mutations come in three basic varieties: substitutions swap one base (or a stretch) for another; insertions add a base or a stretch of bases where none previously existed; and deletions remove bases. These categories are useful from a purely visual standpoint: but what really matters is their effect on the human body. For reasons that involve 3 billion years of trial and genetic error, our genetic codes are resistant to mischief.   Most mutations do not cause a loss of function; the average human carries about 4 million mutations, and only a handful of these may, over the course of his life, cause him harm.

However, like most cancerous mutations, the one in retinoblastoma isn’t just harmful: it’s positively disfiguring. Retinoblastoma is caused by a deletion, the removal of a contiguous stretch of bases (Figure D).

Figure D: A 4 base deletion. Bases 4 through 7 from the reference are deleted.

Like all mutations, RB deletions can be germline, inherited from either or both parents, or somatic, acquired after birth. Deletions in retinoblastoma occur across thousands, and sometimes millions of bases. These deletions are confined to a single, contiguous segment of the genetic code: a gene named RB1. RB1 is responsible for the synthesis of the retinoblastoma suppressor protein, which ensures the timely demise of retinal cells (Figure P) . The RB1 deletion disables this protein, and cancer, as a result, is loosed on the eye.

Figure P: The crystal structure of the retinoblastoma tumour suppressor protein. The deletion of a portion of RB1 disables the suppressor protein and leads to retinoblastoma.


How are deletions in RB1 detected? The answer is long and complex and involves detours through chemistry, biology, computer science, and clinical pathology. It starts with the patient—usually a child, as we’ve seen, under the age of 3. A sample of tissue from the patient is subject to target sequencing, in which the stretch of the patient’s genome containing RB1 is chemically sequenced. Target sequencing generates a set of ‘reads’. Reads are short substrings of the patient’s genetic code; a defect in the genetic code is reflected in the reads—or so the theory goes. In practice, as we shall see, inferring mutations from reads is far from straightforward.

In the bioinformatics workflow that follows, the sequenced reads are mapped to a healthy reference genome, and mutations are identified by comparing the alignments against the reference. If the RB1 deletion is among the many mutations present, a report is drawn up, confirming the diagnosis and listing a series of therapy recommendations. If the condition is diagnosed early, and if the recommendations are followed, the child can undergo a full recovery, and keep both his eyes in the process. Starting with the tissue assay (biology), moving on to target sequencing (chemistry) and bioinformatics (computer science), and concluding with a report and a list of therapy recommendations (biology/pathology): this is Strand’s clinical workflow.


Figure R: Identical to Figure D, this shows how reads from the patient’s sequence express a deletion. On the bottom are locations on the read. Because bases 4 through 7 on the reference are deleted from the read, only 5 read bases participate over a nine base stretch on the reference.

One of the many challenges in confirming retinoblastoma lies in the length of the deletion. In Figure D, the deletion is four bases long. Reads from the sequencing process are longer, potentially spanning a hundred bases or more, so we can expect reads from the the deleted region to fully express the deletion (Figure R). However, RB1 deletions in retinoblastoma are long, far longer than the read itself.  How can we confirm these long deletions?

The answer lies in split alignment. A small but critical algorithmic piece in Strand’s clinical workflow, split alignment is meant to detect long deletions such as those in retinoblastoma. The idea is simple. In Figure R, notice how bases 1 through 3 on the read map to locations 1 through 3 on the reference, and bases 4 through 5 on the read map to locations  8 through 9. In other words, the read is “split”: a part of it maps to one stretch on the reference (1-3), and the rest of it maps elsewhere (8-9).

Long deletions are exactly the same, except “elsewhere” could be a thousand, a million, ten million bases away (Figure L).

Figure L: Similar to Figure R, this shows a 1000 base deletion, from base 4 to base 1003. The read is split: its first three bases map to locations 1 through 3 on the reference, and its last two map to locations 1004-1005.

Split alignment uses the insight that  “split” reads specify long deletions. Given a read and an approximate match to a location on the reference, say location L, split alignment splits the read, generating a near-perfect match at location L and another at location M. The long deletion is then inferred to occur between location L and location M. L and M can be arbitrarily far apart (long deletions), on different strands (inversions),  or on different chromosomes altogether (translocations).  Much of the art in split alignment involves figuring out (i) where to split the read and (ii) where to map the resulting pair of segments.

Figure S: A long deletion in RB1, detected by StrandNGS’s clinical workflow. The length of the deletion, 9,030 bp, is visible at the very top. Purple and blue reads are split and connected by dashed-lines; they span the deletion.

StrandNGS, our flagship bioinformatics software, supports split alignment. Figure S shows the result of applying the StrandNGS split alignment workflow–itself part of the larger clinical workflow—to a sample from a suspected retinoblastoma patient.  Reads spanning the deletion can be observed in the elastic genome browser, which enables the concurrent inspection of distant parts of the genome. In the figure, both halves of the deletion in RB1 are visible, with split reads, in purple and connected by dashed lines, spanning the roughly 9,000 base pairs that were deleted.

We started with a child suspected of retinoblastoma, sequenced critical parts of his genome, subject the sequenced reads to split alignment, and detected the culprit: a long deletion in RB1, confirming the diagnosis. Great. But what does this mean for the child? Can we cure the disease? Can we save the eye? The answers to these questions turn on something simple: how soon did we detect it? Prognoses for stage 1 and some stage 2 retinoblastomas are very good. The eyes usually survive; the child recovers to enjoy a normal childhood. In principle, early detection should be possible for germline cases: if one or both parents had it, there’s a good chance that it’s been passed on to one or more of the children.

This is where Strand’s clinical workflow, and especially its germline test, is important; it helps in early detection of asymptomatic, “good” cancers like retinoblastoma, dramatically improving the odds that the patient goes on to lead a normal life. As we’ve seen in this post, split alignment is just one of the many steps in the clinical workflow that makes this possible.

See Strand’s study on retinoblastoma, published in Molvis.

Check out split alignment in StrandNGS.

Check out Strand’s clinical workflow.

Reference: Indian Journal of Cancer.