Accuracy puts the there in speed: DNA-Seq stuff!

In certain circles, asking why speed is good is tantamount to heresy. I once submitted a paper with the phrase “computational performance is not an explicit goal of this work” in the abstract. It was rejected; in his explanatory note, the reviewer had said that my approach was “unlikely to yield great speedups,” and that he didn’t know how it could “ever be fast.”

In other company, speed isn’t that… impressive. In a reversal that will be described by writers more portentous than myself as cosmic, a few months out from the dour missive above, I received an oral comment on yet another work: “I don’t care about speed. I care about accuracy. I can spend days doing my analysis, but it had better be right!”

What can we agree on? Well, for starters, we can agree that, contra the commenter above, speed is good. Speed argues for itself. Everybody feels the weight of time, and so its warring counterpart, speed, is always welcome.

But accuracy is the severe, beady-eyed parent of speed. Accuracy puts speed through school; speed takes afternoons off and threatens to lope off at the slightest. Accuracy is weathered and wizened and armchair-bound. Speed eats Paapdi Chaat from the roadside without even washing hands, yaar. Accuracy keeps speed honest. Speed keeps accuracy grudgingly, peevishly, alive.

So it’s easy to have one without the other; but can you have both? Witness Strand NGS 3.0. DNA-Seq in three-dot-oh is twice as fast as a contemporary workflow. But not just that; it’s also accurate, generating callsets with high precision and recall on a host of whole exome and genome samples. You can read about some of the chatpata algorithmic details here (after logging in, click on the link titled “Algorithms Engineering at Strand Life Sciences”.). For now, some pictures should do.

DNA-Seq in Strand NGS vs BWA + GATK. Timings are on a 9 gigabase whole-exome sample.
Accuracy of DNA-Seq in Strand NGS.

For more information, do read through our application note on  Algorithms Engineering which presents the algorithmic and engineering challenges. This note mainly focuses on DNA-Seq, highlighting the challenges we’ve sought out and how Strand as a team met them.