Interesting findings by ChIP-Seq and DNA-Seq analysis using Strand NGS

The year 2014 was a great year with exciting features and enhancements updated in Strand NGS. We take this opportunity to thank our clientele and well-wishers for their support and feedback. We wish you all a happy and prosperous new year and look forward to a more fruitful engagement in 2015.

Here are two recent publications that analyse ChIP-Seq and DNA-Seq data using Strand NGS.
1. A Comprehensive Profile of ChIP-Seq-Based PU.1/Spi1 Target Genes in Microglia by Satoh et al.
2. Localised Dominant Dystrophic Epidermolysis Bullosa with a Novel de Novo Mutation in COL7A1 Diagnosed by Next-generation sequencing by Nagai et al.

The paper by Satoh et al. investigates the biological role of the transcription factor PU.1 in regulation of microglial functions. Though PU.1 plays vital role in microgliogenesis, the comprehensive profile of PU.1/Spi1 target genes in microglia is unknown. In this paper, Strand NGS was used to analyse SRP036026 ChIP-Seq data set and identify the role of PU.1/ Spi1 in microglial gene regulation. Using Strand NGS, around 5,264 ChIP-Seq-based Spi1 target protein coding genes (Spi1, Irf8, Runx1, Csf1r, Csf1, Il34, Aif1 (Iba1), Cx3cr1, Trem2, and Tyrobp) were identified in BV2 mouse microglial cells. Motif analysis by GADEM revealed the PU-box consensus sequences (5’-GAGGAA-3’) were located on 80.3% of the peaks detected by MACS. By downstream pathway analysis, the ChIP-Seq-based Spi1 target genes were found to show significant relationship with diverse pathways essential for normal function of monocytes/ macrophages (like endocytosis, phagocytosis, lysosomal degradation). Hence PU.1/Spi1 was found to have an important role in microglial gene regulation and any aberrant regulation of these target genes would contribute to neurodegenerative diseases by activated microglia accumulation.

The second paper by Nagai et al. is a case study of a 10-month-old female infant with local Dominant Dystrophic Epidermolysis Bullosa. A targeted next generation sequencing was performed with the proband’s peripheral blood for 16 genes associated with Dystrophic Epidermolysis Bullosa. The sequenced data was aligned and analysed using the DNA variant analysis workflow in Strand NGS (formerly Avadis NGS). A heterozygous single nucleotide variation on chr.3: g.48616827C>T (negative strand) that corresponds to a missense mutation of p.Gly1761Asp in the triple helix domain of COL7A1 was detected. This de novo high confidence mutation was also confirmed by Sanger sequencing. This mutation was detected only in the proband and was not found in the parents, in 100 healthy Japanese alleles as well as in dbSNP.

Webinar on ‘Integrated Multi-Omics Analysis with Strand NGS and Agilent GeneSpring 13 – Case Study’

Webinar on ‘Integrated Multi-Omics Analysis with Strand NGS 2.1 and Agilent GeneSpring 13 – Case Study’ on 19 and 20 November

Presented by ‘Agilent Technologies’ and ‘Strand Life Sciences’

Abstract
Integrating Next Generation Sequencing data with other omics- studies is now possible with release of GeneSpring 13 and Strand NGS 2.1, opening up newer avenues for analysis and interpretation of NGS experiments. In this webinar, we will demonstrate the new integrated analysis workflow using high throughput microarray and next generation sequencing data.
Using a case study the following functionality of the multi-omics approach would be highlighted:
• Export of relevant information (reads, region lists, entity lists) from Strand NGS 2.1, for import into GeneSpring 13.0.
• Create an experiment in GeneSpring using the Strand NGS data.
• Perform correlation study and pathway analysis in a multi-omics context.

Speaker:
Dr. Pramila Tata, Director – Applied Science, Strand Life Sciences
Dr. Carolina Livi, Segment Marketing Manager, Agilent Technologies

Webinar details:
Session 1 for SAPK/ APFO: November 18, 2014; 8:00 PM PST ( that is 19 November, 9:30 AM IST)
Session 2 for EMEA and India: November 19, 6:00 AM PST (that is 19 November, 7:30 PM IST)
Session 3 for AFO: November 20, 8:00 AM PST (that is 20 November, 9:30 PM IST)

Register on or before November 18, 2014 at http://www.strand-ngs.com/webinar_registration

About Speaker:
Dr. Pramila Tata, Director – Applied Science, Strand Life Sciences, has over 15 years experience in cancer research, software support, product development and training technical support teams and field application scientists. Pramila, has earned her Ph.D in Molecular biology from Indian Institute of Science, Bangalore. Prior to joining Strand, she was with Fred Hutchinson Cancer Research Center at Seattle working in cancer biology using budding yeast as a model system. At Strand, Pramila leads the application science team.
Dr. Carolina Livi, Bioinformatics Segment Manager, Agilent Technologies, has over 8 years experience in the field of bioinformatics, regulatory biology and research in cancer and aging. Carolina Livi, has a Ph.D in Molecular developmental biology, from California Institute of Technology. Prior to joining Agilent, Dr Livi has worked with University of Texas Health Science at San Anotonio (UTHSCSA) as a Research Assistant Professor, in the department of Molecular medicine. At Agilent, Dr Livi is Bioinformatics Segment Manager for Life Sciences Research in Academia and Government within the Segment Marketing group.
For more information, please write to sales@strandngs.com

Meet Strand at ASHG 2014 and find out about Strand’s solutions ‘Strand NGS’ and ‘StrandOmics’

Strand is excited to be an exhibitor at the 64th Annual Meeting of the American Society of Human Genetics, San Diego from 18 – 22 October 2014, one of the world’s largest human genetics meetings. At ASHG, Strand will demonstrate the latest version of its state-of-the-art NGS data analysis tool ‘Strand NGS and interpretation and reporting platform StrandOmics.

At ASHG 2013, Strand had demoed Avadis NGSv1.5 and presented posters on ‘Aneuploidy and Normal Cell Contamination Aware Approach to Detect Copy Number Variations in Cancer Using Next Generation Sequencing Data’ and ‘Shortening the Diagnostic Odyssey: Integrating Genomic, Structural, and Phenotypic Information to Reduce Time of Rare Disease Diagnosis’. This year we are thrilled to present Avadis NGS with its new brand name ‘Strand NGS’ and  StrandOmicstool for the first time at ASHG. Loads of new features and utilities with respect to visualizations and interpretations will be highlighted by our representatives at the booth # 338 at ASHG 2014.

This year, we are also presenting four new posters in the sessions on ‘Bioinformatics and Genomic Technology’ and ‘Clinical Genetic Testing’. The posters highlight benchmarking studies we conducted to compare our variant calling and alignment algorithms with some of the other options available to scientists, present case studies from our clinical genetic testing practice in India, and illustrate our variant interpretation and reporting platform, StrandOmics. Detailed information about each of these scientific posters and agenda is mentioned below. For more information visit our ASHG webpage

Come and meet our experts at ASHG 2014, booth #338, to learn more about Strand’s solutions.

To schedule a meeting or demo request of Strand NGS, please write to:
Vinay Paramasivan at vinayp@strandls.com

To schedule a meeting or demo request of StrandOmics, please write to:
Adrienne Craig-Kennard at adrienne@strandls.com
Dr. Smita Agrawal at smita@strandls.com

About Strand NGS

Strand NGS, an integrated desktop software that enables biomedical researchers to manage, analyze, and visualize data from next-generation sequencing (NGS) experiments. The software is designed to enable biologists to make sense of NGS data by providing a rich, visual environment for QC, analysis, and interpretation of ChIP-Seq, RNA-Seq, small-RNA-Seq, DNA-Seq and Methyl-Seq data. The enterprise version of Strand NGS (server edition) supports multi-member teams to collaborate, share data, and speed up analysis, while the easy backup-restore option allows safe and secure data transfer. For more information visit http://strand-ngs.com/ 

About StrandOmics:

StrandOmics, is a variant calling and interpretation tool designed to support sequencing lab workflows. The tool reduces variant interpretation and reporting time from days to a few hours. StrandOmics, is based on first-hand experience interpreting variants from hundreds of genomes. To know more visit http://strandomics.com/

Webinar on Integrated Pathway Analysis in Strand NGS

Webinar on Integrated Pathway Analysis in Strand NGS – 27 August 2014

Presenter:  Dr Veena Hedatale, Senior Application Scientist, Strand Life Sciences

Abstract:  Strand NGS (formerly Avadis NGS) supports functional analysis of entities from diverse experiment types to understand their role in a biological process. This webinar will illustrate various ways of integrating next generation sequencing data from different experiments. With focus on visualization of biological data, analysis steps showing the use of an entity list to find statistically significant pathways will be discussed. The pathways can either be derived from literature (like NLP, MeSH) or curated pathways (like Wikipathways or BioCyc). The webinar will also provide more insights into how one can overlay data from single or multiple sequencing experiments onto the same pathways simultaneously.

Details:

Session 1: August 27; Europe +Asia; 11 AM Central European Time (2:30 PM IST)

Session 2: August 27; North + South America; 9 AM Pacific Standard Time (9:30 PM IST)

Confirm your participation before 27 August 2014 by Registering online  http://www.strand-ngs.com/webinar_registration

About presenter: Dr. Veena Hedatale, has a PhD in Plant Genetics from The Radboud University, Netherlands focused on meiosis and recombination. Her prior academic experience at Cornell University was on genetic mapping and gene transformation in Rice. She has worked with Monsanto, and contributed to data mining, database development as well as gene/promoter/pathway discovery for traits related to yield and stress in crop species. At Strand, Veena has worked on Pharmacogenomic analysis of targets and Gene family analysis projects. Currently, she is part of the Strand NGS Application Science team and is involved in the analysis of next generation sequencing data.

For more information, please write to sales@strandngs.com

Configuring SNP detection pipelines for accurate analysis of clinical samples

Webinar of the Month Series: Configuring SNP detection pipelines in Avadis NGS for accurate analysis of clinical samples

Presenter:  Dr Vamsi Veeramachaneni, Vice President, Strand Life Sciences

Abstract: Running a SNP detection pipeline and identifying high quality variant calls quickly is challenging. This is especially true in the case of clinical labs where multiple panels are used and kit-specific biases can result in false positive SNP predictions.

In this webinar, Dr Vamsi will show, how one can use the powerful visualization features of Avadis NGS to quickly detect false positive SNP predictions, identify the cause of the errors, and fine-tune the detection pipeline for accurate analysis

Details:

Session 1: Feb 26; Europe +Asia; 10 AM Central European

Session 2: Feb 26; North + South America; 9 AM Pacific Standard Time

Confirm your participation on or before 24th Feb 2014 by Registering online for free

Mapping RefSeq transcripts to the genome using UCSC

Transcript annotations are extensively used in NGS data analysis. In RNA-Seq, they are used at every step of the pipeline – to map spliced reads against the genome, perform quantification, detect novel exons etc. In DNA-Seq, they are used to predict the effect of variants detected in the sample. Clearly accurate transcript annotations are vital for NGS work.

Many researchers prefer to work with RefSeq transcripts because they are manually curated. But there is a problem. The RefSeq transcript project provides the transcript sequence and the location of exons on the transcript sequence but does not provide the genomic coordinates for the exons. So one common strategy is to obtain the genomic coordinates from UCSC. The folks at UCSC routinely align the RefSeq transcript sequences against the genome using BLAT and make the results available as a “refFlat” files in their download site.

Unfortunately, these BLAT alignment are sometimes wrong.

Shown below is the transcript track for TNNI3 which is a gene on the negative strand of chromosome 19. Note that the coding region of the first exon in the “RefSeq genes” picture occupies 22bp while the USCC track at the top shows only 11bp.

Exon 1 of TNNI3 in UCSC

The RefSeq transcript that was used by UCSC for alignment can be obtained by clicking on the TNNI3 word in the RefSeq gene track and it is  NM_000363.4. A portion of the transcript entry is shown below.

TNNI3 RefSeq transcript details

The RefSeq entry clearly indicates that only 11 bases (144-154) at the end of the first exon represent coding bases. Moreover, the transcript has a CCDS entry indicating that there is a genomic alignment which translates to the protein sequence shown.

To get a better understanding of the problem, we looked at the UCSC and the RefSeq transcripts in more detail in the Elastic Genome Browser. The introns have been compressed so that exonic and essential splice site sequences can be seen in more detail.

TNNI3 in EGB

Some of the observations from the above picture are:

  • the alignment for the RefSeq transcript leads to a premature stop-codon very early on,
  • the essential splice site signals are correct in the UCSC transcript but wrong in the RefSeq transcript alignment

These are sanity checks that any researcher using the UCSC alignments of RefSeq transcripts should incorporate before carrying out analysis.

And, finally, the picture also suggests why this error happened. The incorrect extension to exon 1 in the RefSeq transcript alignment (GCATCACTCAC) is very similar to the sequence of the small exon 2  present in the UCSC transcript (GCATCGCTGCTC). It is possible that the BLAT alignment is not well suited for detecting small intermediate exons especially if there is an alternate alignment which is very similar.

From yeast to mice to chicks to stem cells – 4 ways RNA-Seq analysis sheds light on the world around us (a four part series)- Part II

In my previous blog post, I had walked you through a yeast paper where the authors had mapped its mitochondrial trancriptome. They had used the RNA-Seq analysis tool from Avadis NGS in clever ways to figure out details about the transcriptome that played a key role in piecing together the final map. In this week’s post, I briefly describe how gene expression data was used to show how evolutionarily, the partially bony tongues of birds were transformed into a muscular tongue in mammals.

Briefly, Liu H. et. al in their PNAS paper, demonstrated the importance of the Odd skipped related 1 (Osr-1) transcription factor in causing morphological differences between the tongues of birds and those of mammals. This paper is a thorough compilation of details that illustrate how differences in gene expression patterns can cause differences in the way an appendage or organ develops in different species- or in this particular case different classes of organisms.

In mammals, the tongue is just striated muscle, while in birds it is mostly cartilaginous. Liu H. et. al approach this problem from a developmental angle. Since the tongue skeleton and framework is formed at the time of embryogenesis, specifically during neural crest mesenchyme differentiation, the authors looked at changes in gene expression in mouse and chick embryos. The Osr-1 and Osr-2 genes are implicated in embryogenesis and organogenesis and are thus investigated for their role in tongue development in this paper. Most of the paper dealt with histologically mapping Osr-1 and Osr-2 expression in embryos at different times during embryogenesis, as well as in embryos that had increased/ reduced expression of these genes, and checking for morphological changes in developing tongue tissue. They were able to show that the presence of Osr-1 at the wrong moment in chick tongue development (embryonic day- D5) impaired chondrogenesis or cartilage formation, while its tissue-specific absence in mice (with neural-crest inactivated Osr-1) caused the formation of ectopic cartilage in the anterior tongue (on embryonic day E12.5).

The authors of this paper used the RNA-Seq workflow from Avadis NGS as one of the tools to demonstrate differential expression of genes during tongue morphogenesis at day E12 between mice with neural-crest specific inactivation of the Osr-1 gene and their control littermates. They demonstrated that the expression of genes involved in chondrogenesis such as the SRY-box containing gene-9 (Sox9), Sox5, (and their downstream target genes) and a number of chondrocyte collagens were up-regulated in the central tongue region, while Osr-1 mRNA levels were down-regulated in these mice with neural-crest inactivated Osr-1 compared to those in control mice. With RT-PCR data and in-situ hybridization results to corroborate their RNA-Seq data, the authors were able to prove that Osr-1 is a negative regulator of Sox9 and hence cartilage formation.

This paper provides an elegant answer to the question of how tongue morphogenesis changed evolutionarily with the expression of an extra gene (here Osr-1) at a specific developmental stage. RNA-Seq analysis was thus similarly used in this second paper to elucidate differential expression of genes in a given experimental system.

Next time, I will discuss how Microarray data analysis and RNA-Seq analysis fared in a head to head contest when they were used to map changes in gene expression after mice with Gaucher’s disease were treated with 2 different bio-similars. So don’t forget to log back in……

From yeast to mice to chicks to stem cells – 4 ways RNA-Seq analysis sheds light on the world around us (a four part series) – Part I

Last month, as in my wont, I was taking a look at publications that had very recently used our software for analysis of NGS data. Of these, four were by groups who had used the RNA-Seq workflow from the Avadis NGS suite to run the analysis of their data. Intrigued by the fact that this research had been done in 4 diverse biological systems, I decided to take a closer look at their work.  I was hoping to discover how the RNA-Seq tool had been adapted (if differently) for each of these research problems.

Starting with research from a group elucidating the yeast mitochondrial transcriptome, I moved on to a paper investigating genes involved in tongue morphogenesis in birds vs mice, to work by a group demonstrating how the use of 2 bio-similars altered the transcriptome differently in mice suffering from Gaucher’s disease, and finally to research that showed that the knockdown of the housekeeping gene HPRT in murine embryonic stem cells altered developmental and metabolic pathways during neuronal differentiation. What follows is a 4-part series which offers a brief synopsis of the experimental work done in each system, and the part RNA-Seq data had to play in answering questions raised by the work.

In the first paper on my list, Turk E.M. et. al used bioinformatics and RNA-Seq analysis to map out the mitochondrial transcriptome in Saccharomyces cerevisiae or in yeast reference strain S288C. Since yeast is one of the organisms popularly used to model mitochondrial genetics, the publication of its mitochondrial transcriptome is important for future research in this field. The transcriptome, as per the authors, is basically “a parts list of all RNAs of a system and a description of their boundaries, their physical location on the genome, and their abundance”.   In order to map it, they used a combination of in-silico methods, RNA-Seq analysis, and RT-qPCR. This helped them to correct all promoter, origin of replication (ori) and tRNA annotations; aided in estimating the expression levels of all mitochondrial transcripts; demonstrated the presence of alternate splicing; and helped to determine the identity of a ribonuclease (RNase) that potentially sculpted some of this landscape. Here is my perspective on the role played by the RNA-Seq analysis in context to the bigger mapping efforts by them.

First some methodology: after sequencing RNA using an Illumina HiSeq2000 platform, and mapping reads with Bowtie, Avadis NGS was used to visualize and analyze mapped reads. Differential expression of genes was demonstrated using the DESeq statistical method (this is an R script that can be set up to run within the Avadis NGS suite).

The primary role played by the RNA-Seq data analysis in this paper involved quantifying the abundance of different RNAs in yeast’s mitochondrial transcriptome by analyzing the expression of 35 gene products under conditions of mitochondrial activation. However, by simultaneously measuring all the mRNA, rRNA, tRNA and other ncRNA, the authors could not optimize for the detection of transcription start sites (requires tRNA and rRNA subtraction for deep coverage), RNA 3’ ends or active ori using RNA-Seq analysis.

Interestingly, the authors were still able to use RNA-Seq data to precisely map the 3’ terminal nucleotide of the 24 mitochondrial tRNAs due to the inclusion of a non-encoded tri-nucleotide CCA, post- transcriptionally to all their 3’ ends. Additionally, after deriving a 20-base promoter sequence and a 5-part consensus sequence as the sequence for active ori, the authors were able to use RNA-Seq analysis to show that out of the 8 putative ori in the yeast mitochondrial transcriptome, only 3 were active. The other 5 appeared disabled due to the presence of insertions within the promoter sequence; this could be inferred from far fewer RNA-Seq reads associated with the five ori.

Another interesting result from RNA-Seq experiments was confirmation that for genes present on the same primary transcript, relative expression levels were different. RNA-Seq data was also used to demonstrate alternate splicing for the first time in yeast mitochondrial RNA.  Additionally, it could be used to show that all yeast mitochondrial proteins in strain S288C used AUG as their start codon instead of AUA.  So in spite of the limitations posed by the methodology in this paper, the RNA-Seq tool from Avadis NGS was used in smart ways to get meaningful information about the transcriptome. The authors were also able to establish the role of Dis3p as a mitochondrial RNase whose absence allowed the detection of RNA sequences such as introns and antisense transcripts (or mirror RNAs) in Dis3p mutants, again using RNA-Seq analysis.

The authors of this paper thus coupled basic quantification of RNA reads and some differential expression data from RNA-Seq analysis, with other results, to offer the reader a fairly complete view of the yeast mitochondrial transcriptome.

Next week, I take a look at how RNA-Seq from Avadis NGS was used to show that the alteration in expression of certain genes during development, changed the phenotype of tongues in mice compared to birds- a story of class differences indeed!

A Sneak Peak at Avadis NGS Version 1.5’s New Features!

The release date for the next version of Avadis NGS is just around the corner. Not only does the software include a lot of refinements in some of the older features, but it also boasts of a number of cool new ones! So here is a sneak peak at some of these features in version 1.5.

As a cancer biologist I am really looking forward to the new Copy Number Detection tool. The discovery in the last decade that CNVs are very common in the human genome, with a frequency greater than 10%, has changed how biologists perceive their importance.  This means that in addition to SNPs and Indels, these larger structural variations in human or even in say mouse DNA, could explain the amazing diversity within each species.   CNVs can totally disrupt the transcription of genes, either with the deletion/duplication of the gene, or by the disruption of its regulatory site. For this reason, several research groups across the world are trying to connect the dots between CNVs and disease/cancer loci. The Avadis NGS team is particularly excited about the copy number workflow, because it corrects data for sample ploidy and also for contamination by normal cellular material in tumor samples. This will provide better estimates of CNV regions in the genome. Certainly a tool worth watching out for!

The team has also tried to make more pipelines available to the end-user, allowing compute-intensive jobs to run in the background. So one could potentially be running a complete workflow from raw-read alignments (for DNA-Seq, RNA-Seq or ChIP-Seq) to variant calling in these automated pipelines, and writing a blog at the same time!

Another feature that is a particular favorite of mine is an Elastic Genome Browser that the Avadis team has come up with. Who doesn’t appreciate good visualization of data? Remember how we nodded off when someone presented a seminar with too many text slides? The new elastic genome browser visualization tool in version 1.5 allows a user to look at multiple genomic regions simultaneously. So if you are looking at data to identify gene fusions, novel splices or large structural variations, this tool will help you put all the information on one screen, and will also let you look at only those regions that interest you.

For those users heavily invested in DNA sequencing, the software has been upgraded considerably. It performs additional pre-processing steps such as local realignment and base quality score recalibration. While local realignment will ensure that sequenced reads have fewer alignment artifacts, both these refinements will help reduce false positives. Overall, the workflow will result in improved variant calling.

And last but by no means the least, the Avadis NGS team has turned a nice trick with the Alignment tool. In the new version, the tool can run faster by a factor of 2 on machines with memory capacities of 8GB or more. In case of DNA alignment, the new version further speeds up the computation by utilizing the SSE feature of the processor architecture.

I am really excited about all these features and can’t wait to see them all come together!

The Need for Realignment to dbSNP

We’ve been using Avadis NGS to analyze a number of clinical samples and ran into an interesting case for InDel detection that could lead to false interpretation unless handled properly. The case at hand involves a child who suffered from Pulmonary Hypertension, Pulmonary Infections, and a few other abnormalities, all at birth or in the first year of life, and subsequently died in the second year. We had DNA from this child as well as from his (consanguinious) parents. All three were subjected to Exome sequencing.

We used Avadis NGS to align the data and then performed variant calling. Next, we used the Find Significant Variants functionality in Avadis NGS to identify variants which were heterozygous in the parents but homozygous in the child, and then restricted these variants to those with allele frequencies below 1% using the Find Damaging Variants operation. This yielded a hundred or so gene candidates. We then picked out genes known to cause rare childhood diseases.

One such gene was ALMS1, which is involved in Alstrom Syndrome, a disease that is known to be associated with the occurrence of recurrent pulmonary infections. The mutation in ALMS1 was a CTC insertion, heterozygous in both parents and homozygous in the child. This mutation causes the insertion of an extra amino acid P (=CCT). It also appeared to be a novel mutation not reported in dbSNP, warranting further investigation.

By habit, we also look at dbSNP variants in the vicinity of our mutation of interest just so we can explain the presence of common polymorphisms, if any, close to our potentially deleterious mutation. In this case, there was indeed a dbSNP insertion close-by, as shown in the picture below.

This dbSNP variant had the exact same insertion sequence, i.e., CTC, but appeared 3 bases away from our mutation. And it so happened that the 3 intervening characters were CTC as well! In other words, our mutation and the dbSNP variant both were different ways of just calling the same variant; CTC was repeated twice and this could be called as an insertion to the left (our mutation) or an insertion to the right (dbSNP). Further, it so happens that the dbSNP variant is a very common one. Therefore our mutation, far from being novel, was quite common and not a candidate for the case at hand!!

This tells us that in addition to local realignment of an InDel across multiple overlapping reads, we also need to realign to dbSNP to identify the true allele frequency of our mutation, an alteration that we will definitely seek to add to Avadis NGS in the near future.

← Older posts