Analyzing Next-Generation Sequencing Data
June 6th – June 17th, 2011
Kellogg Biological Station, Michigan State University
Instructors: Dr. C. Titus Brown, Dr. Ian Dworkin, and Dr. Istvan Albert.
Applications must be received by March 25th for full consideration.
More information and application link here:
This intensive two week summer course will introduce students with a strong
biology background to the practice of analyzing short-read sequencing data from
Roche 454, Illumina GA2, ABI SOLiD, Pacific Biosciences, and other next-gen
platforms. The first week will introduce students to computational thinking and
large-scale data analysis on UNIX platforms. The second week will focus on
mapping, assembly, and analysis of short-read data for resequencing, ChIP-seq,
No prior programming experience is required, although familiarity with some
programming concepts is helpful, and bravery in the face of the unknown is
necessary. 2 years or more of graduate school in a biological science is
Students will gain practical experience in:
* Python and bash shell scripting
* cloud computing/Amazon EC2
* basic software installation on UNIX
* installing and running maq, bowtie, and velvet
* querying mappings and evaluating assemblies
Materials from last year’s course are available at http://ged.msu.edu/angus/
under a Creative Commons/use+reuse license.
Ian Dworkin firstname.lastname@example.org
- BIOBASE Launches Genome Trax™ for Next Generation Sequencing Analysis (eon.businesswire.com)
- Expression Analysis Expands Sequencing Services (eon.businesswire.com)
- SeqCentral Puts DNA Sequence Crunching In The Cloud (techcrunch.com)
- Roche and IBM Collaborate to Develop Nanopore-Based DNA Sequencing Technology (eon.businesswire.com)
- GATC To Sequence 100,000 Genomes by 2014 (singularityhub.com)
- Virginia Tech researchers contribute to turkey genome sequencing (eurekalert.org)
By Monica Heger
Researchers from Harvard University and the University of Melbourne have used candidate gene-prediction algorithms combined with targeted sequencing on the Illumina Genome Analyzer to identify novel causal mutations in the mitochondrial disease human complex I deficiency, a respiratory disorder that cause skeletal muscle myopathy, cardiomyopathy, hypotonia, and other clinical manifestations.
In the study, published this week in Nature Genetics, the researchers sequenced 103 candidate genes in a cohort of 103 cases and 42 controls. In 60 of the cases, there had not been a molecular diagnosis, and the researchers were able to uncover the molecular cause in 13 of those cases, including identifying two previously unreported causal mutations. In total, the team identified 47 unique mutations in 20 different genes that appear to be associated with the disease.
The researchers said the method could be a good way to identify causal mutations for complex diseases because it enables the sequencing of many different genes in larger cohorts, without being prohibitively expensive.
“I think approaches like this will be popular in the next few years for certain groups of disease, such as heart disease, mental retardation, neurological disease, and cancer,” said David Thorburn, head of mitochondrial research at Murdoch Childrens Research Institute in Melbourne and a senior author of the study.
Those diseases have a strong genetic component, but typically involve hundreds of genes — unlike Mendelian diseases, for which whole-genome and whole-exome sequencing have worked well to find causal mutations by sequencing only a small number of related individuals (IS 3/16/2010 and 9/29/2009).
In the Nature Genetics study, the researchers first identified 103 genes they wanted to target. They began with 45 genetic subunits known to be involved in the enzymatic activity of the human complex I, said Vamsi Mootha, an associate professor of systems biology at Harvard Medical School and senior author of the paper. “We then used a phylogenetic strategy to identify additional assembly factors,” he said. The team looked at the evolutionary history of complex I, comparing organisms that have the complex to those that don’t, to determine which other genes are likely to be involved in the disease.
They then combined the DNA into five different pools for the cases and two pools for the HapMap controls, and performed PCR amplification reactions to capture the 103 genes, which comprised 145 kilobases of sequence. The resulting amplicons were then sequenced on the Illumina GA with 76-base single-end reads, to an average 168-fold coverage per individual.
Mootha said that since doing the experiment, there have been a number of technology developments that make the protocol easier and more accurate. For instance, the team is now using custom designed reagents on Agilent’s SureSelect platform instead of PCR amplification for target enrichment. Also, in the current study, the team did not barcode its samples before pooling, so after they did variant calling, they had to go back and match the variants to the individual.
The team called 898 single nucleotide variants and indels. They then filtered out variants present in healthy individuals, synonymous variants, non-coding variants that were not associated with splice sites or tRNA, and missense variants at sites with low evolutionary conservation. That narrowed the list down to around 200 variants, and the team then validated 151 likely deleterious variants.
They then looked at the variants in the 60 cases lacking a molecular diagnosis for known pathogenic mitochondrial DNA mutations, including homozygous and compound heterozygous variants. Three individuals had previously reported pathogenic mitochondrial mutations and eight had recessive-type mutations in known disease genes. Additionally, two individuals had recessive-type mutations in candidate disease genes NUBPL and FOXRED1.
The thirteen mutations, including the two mutations in NUBPL and FOXRED1, which were previously not associated with the disease, were all confirmed as disease-causing. When the researchers repaired the mutation in patients’ fibroblasts, the complex I was no longer deficient.
“We now have 56 patients with complex I deficiency with molecular diagnoses. These diagnoses comprise 47 unique mutations in 20 different genes,” said Thorburn. “For comparison, a ‘simple’ genetic disease such as cystic fibrosis is always caused by mutations in one gene, and 95 percent of patients have the same mutation.”
Thorburn said that the team is continuing to follow the group of patients to try and identify further mutations that could be used for molecular diagnoses. He said they will continue to use sequencing, and also array-CGH, to look for additional mutations.
“It is likely that some of our patients have mutations in genes not included” in the initial set of 103 genes, he said, so they are also expanding the list of genes. Additionally, they are looking for interactions between mutations in different genes. He said he will continue to focus on mitochondrial diseases.
Mootha added that the study could have implications for other diseases as well. “There are a fair number of common human disorders that are linked mechanistically to complex I including Parkinson’s and type 2 diabetes,” he said. “The hope is that identifying the genes underlying the severe phenotypes will help understand these other disorders.”
CLC Bio, Danish genomics service provider Aros Applied Biotechnology, Roche, and Aarhus University Hospital are collaborating to develop a high-throughput platform for collecting, sequencing, and analyzing DNA extracted from formalin-fixed paraffin-embedded tissue samples, CLC Bio said this week.
Under the $5 million project, for which CLC and the Danish National Advanced Technology Foundation will each contribute half the funding, the partners plan to develop a platform for selecting appropriate FFPE samples, choosing an optimal sequencing technology, and assembling and analyzing the sequence data. The goal is to apply the platform in molecular diagnostics research and to re-analyze samples from preclinical trials where drugs have failed.
According to CLC Bio, the proposed platform will allow researchers to access more samples than those available through fresh tissue biobanks, and link the results to patient data.