Modelling and visualizing fine-scale linkage disequilibrium structure.
BMC Bioinformatics. 2013 Jun 6;14(1):179
Authors: Edwards D
BACKGROUND: Detailed study of genetic variation at the population level in humans and other species is now possible due to the availability of large sets of single nucleotide polymorphism data. Alleles at two or more loci are said to be in linkage disequilibrium (LD) when they are correlated or statistically dependent. Current efforts to understand the genetic basis of complex phenotypes are based on the existence of such associations, making study of the extent and distribution of linkage disequilibrium central to this endeavour. The objective of this paper is to develop methods to study fine-scale patterns of allelic association using probabilistic graphical models. RESULTS: An efficient, linear-time forward-backward algorithm is developed to estimate chromosome-wide LD models by optimizing a penalized likelihood criterion, and a convenient way to display these models is described. To illustrate the methods they are applied to data obtained by genotyping 8341 pigs. It is found that roughly 20% of the porcine genome exhibits complex LD patterns, forming islands of relatively high genetic diversity. CONCLUSIONS: The proposed algorithm is efficient and makes it feasible to estimate and visualize chromosome-wide LD models on a routine basis.
PMID: 23742095 [PubMed - as supplied by publisher]
Bivariate segmentation of SNP-array data for allele-specific copy number analysis in tumour samples.
BMC Bioinformatics. 2013;14:84
Authors: Mosén-Ansorena D, Aransay AM
BACKGROUND: SNP arrays output two signals that reflect the total genomic copy number (LRR) and the allelic ratio (BAF), which in combination allow the characterisation of allele-specific copy numbers (ASCNs). While methods based on hidden Markov models (HMMs) have been extended from array comparative genomic hybridisation (aCGH) to jointly handle the two signals, only one method based on change-point detection, ASCAT, performs bivariate segmentation.
RESULTS: In the present work, we introduce a generic framework for bivariate segmentation of SNP array data for ASCN analysis. For the matter, we discuss the characteristics of the typically applied BAF transformation and how they affect segmentation, introduce concepts of multivariate time series analysis that are of concern in this field and discuss the appropriate formulation of the problem. The framework is implemented in a method named CnaStruct, the bivariate form of the structural change model (SCM), which has been successfully applied to transcriptome mapping and aCGH.
CONCLUSIONS: On a comprehensive synthetic dataset, we show that CnaStruct outperforms the segmentation of existing ASCN analysis methods. Furthermore, CnaStruct can be integrated into the workflows of several ASCN analysis tools in order to improve their performance, specially on tumour samples highly contaminated by normal cells.
PMID: 23497144 [PubMed - in process]
Fast detection of de novo copy number variants from SNP arrays for case-parent trios.
BMC Bioinformatics. 2012 Dec 12;13(1):330
Authors: Scharpf RB, Beaty TH, Schwender H, Younkin SG, Scott AF, Ruczinski I
ABSTRACT: BACKGROUND: In studies of case-parent trios, we define copy number variants (CNVs) in the offspring that differfrom the parental copy numbers as de novo and of interest for their potential functional role indisease. Among the leading array-based methods for discovery of de novo CNVs in case-parent triosis the joint hidden Markov model (HMM) implemented in the PennCNV software. However, thecomputational demands of the joint HMM are substantial and the extent to which false positiveidentifications occur in case-parent trios has not been well described. We evaluate these issues in astudy of oral cleft case-parent trios. RESULTS: Our analysis of the oral cleft trios reveals that genomic waves represent a substantial source of falsepositive identifications in the joint HMM, despite a wave-correction implementation in PennCNV. Inaddition, the noise of low-level summaries of relative copy number (log R ratios) is stronglyassociated with batch and correlated with the frequency of de novo CNV calls. Exploiting the triodesign, we propose a univariate statistic for relative copy number referred to as the minimum distancethat can reduce technical variation from probe effects and genomic waves. We use circular binarysegmentation to segment the minimum distance and maximum a posteriori estimation to infer denovo CNVs from the segmented genome. Compared to PennCNV on simulated data,MinimumDistance identifies fewer false positives on average and is comparable to PennCNV withrespect to false negatives. Genomic waves contribute to discordance of PennCNV andMinimumDistance for high coverage de novo calls, while highly concordant calls on chromosome 22were validated by quantitative PCR. Computationally, MinimumDistance provides a nearly 8-foldincrease in speed relative to the joint HMM in a study of oral cleft trios. CONCLUSIONS: Our results indicate that batch effects and genomic waves are important considerations forcase-parent studies of de novo CNV, and that the minimum distance is an effective statistic forreducing technical variation contributing to false de novo discoveries. Coupled with segmentationand maximum a posteriori estimation, our algorithm compares favorably to the joint HMM withMinimumDistance being much faster.
PMID: 23234608 [PubMed - as supplied by publisher]
T.I.M.S: TaqMan Information Management System, tools to organize data flow in a genotyping laboratory.
BMC Bioinformatics. 2005;6:246
Authors: Monnier S, Cox DG, Albion T, Canzian F
Single Nucleotide Polymorphism (SNP) genotyping is a major activity in biomedical research. The Taqman technology is one of the most commonly used approaches. It produces large amounts of data that are difficult to process by hand. Laboratories not equipped with a Laboratory Information Management System (LIMS) need tools to organize the data flow.
PMID: 16221298 [PubMed - indexed for MEDLINE]