Mapping RefSeq transcripts to the genome using UCSC

Transcript annotations are extensively used in NGS data analysis. In RNA-Seq, they are used at every step of the pipeline – to map spliced reads against the genome, perform quantification, detect novel exons etc. In DNA-Seq, they are used to predict the effect of variants detected in the sample. Clearly accurate transcript annotations are vital … Continue reading “Mapping RefSeq transcripts to the genome using UCSC”

A Case for Long CIGARs: Achieving 50% Compression of BAM Files

I am studying NA06984.454.MOSAIK.SRP000033.2009_11.bam. Not to find SNPs, structural variations, or derive any biological insights into the CEU population – but simply from an IT storage perspective. How much disk space is really needed to store the entire information of the 1.23 million matches present in this 151 Mb file? If I can get to … Continue reading “A Case for Long CIGARs: Achieving 50% Compression of BAM Files”