Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance.
BMC Res Notes. 2014 Oct 22;7(1):747
Authors: Kumar P, Al-Shafai M, Al Muftah WA, Chalhoub N, Elsaid MF, Aleem AA, Suhre K
BACKGROUND: With diminishing costs of next generation sequencing (NGS), whole genome analysis becomes a standard tool for identifying genetic causes of inherited diseases. Commercial NGS service providers in general not only provide raw genomic reads, but further deliver SNP calls to their clients. However, the question for the user arises whether to use the SNP data as is, or process the raw sequencing data further through more sophisticated SNP calling pipelines with more advanced algorithms.
RESULTS: Here we report a detailed comparison of SNPs called using the popular GATK multiple-sample calling protocol to SNPs delivered as part of a 40x whole genome sequencing project by Illumina Inc of 171 human genomes of Arab descent (108 unrelated Qatari genomes, 19 trios, and 2 families with rare diseases) and compare them to variants provided by the Illumina CASAVA pipeline. GATK multi-sample calling identifies more variants than the CASAVA pipeline. The additional variants from GATK are robust for Mendelian consistencies but weak in terms of statistical parameters such as TsTv ratio. However, these additional variants do not make a difference in detecting the causative variants in the studied phenotype.
CONCLUSION: Both pipelines, GATK multi-sample calling and Illumina CASAVA single sample calling, have highly similar performance in SNP calling at the level of putatively causative variants.
PMID: 25339461 [PubMed - as supplied by publisher]
Development and Validation of a 20K Single Nucleotide Polymorphism (SNP) Whole Genome Genotyping Array for Apple (Malus × domestica Borkh).
PLoS One. 2014;9(10):e110377
Authors: Bianco L, Cestaro A, Sargent DJ, Banchi E, Derdak S, Di Guardo M, Salvi S, Jansen J, Viola R, Gut I, Laurens F, Chagné D, Velasco R, van de Weg E, Troggio M
High-density SNP arrays for genome-wide assessment of allelic variation have made high resolution genetic characterization of crop germplasm feasible. A medium density array for apple, the IRSC 8K SNP array, has been successfully developed and used for screens of bi-parental populations. However, the number of robust and well-distributed markers contained on this array was not sufficient to perform genome-wide association analyses in wider germplasm sets, or Pedigree-Based Analysis at high precision, because of rapid decay of linkage disequilibrium. We describe the development of an Illumina Infinium array targeting 20K SNPs. The SNPs were predicted from re-sequencing data derived from the genomes of 13 Malus × domestica apple cultivars and one accession belonging to a crab apple species (M. micromalus). A pipeline for SNP selection was devised that avoided the pitfalls associated with the inclusion of paralogous sequence variants, supported the construction of robust multi-allelic SNP haploblocks and selected up to 11 entries within narrow genomic regions of ±5 kb, termed focal points (FPs). Broad genome coverage was attained by placing FPs at 1 cM intervals on a consensus genetic map, complementing them with FPs to enrich the ends of each of the chromosomes, and by bridging physical intervals greater than 400 Kbps. The selection also included ∼3.7K validated SNPs from the IRSC 8K array. The array has already been used in other studies where ∼15.8K SNP markers were mapped with an average of ∼6.8K SNPs per full-sib family. The newly developed array with its high density of polymorphic validated SNPs is expected to be of great utility for Pedigree-Based Analysis and Genomic Selection. It will also be a valuable tool to help dissect the genetic mechanisms controlling important fruit quality traits, and to aid the identification of marker-trait associations suitable for the application of Marker Assisted Selection in apple breeding programs.
PMID: 25303088 [PubMed - as supplied by publisher]
A low-density SNP array for analyzing differential selection in freshwater and marine populations of threespine stickleback (Gasterosteus aculeatus).
BMC Genomics. 2014 Oct 6;15(1):867
Authors: Ferchaud AL, Pedersen SH, Bekkevold D, Jian J, Niu Y, Hansen MM
BACKGROUND: The threespine stickleback (Gasterosteus aculeatus) has become an important model species for studying both contemporary and parallel evolution. In particular, differential adaptation to freshwater and marine environments has led to high differentiation between freshwater and marine stickleback populations at the phenotypic trait of lateral plate morphology and the underlying candidate gene Ectodysplacin (EDA). Many studies have focused on this trait and candidate gene, although other genes involved in marine-freshwater adaptation may be equally important. In order to develop a resource for rapid and cost efficient analysis of genetic divergence between freshwater and marine sticklebacks, we generated a low-density SNP (Single Nucleotide Polymorphism) array encompassing markers of chromosome regions under putative directional selection, along with neutral markers for background.
RESULTS: RAD (Restriction site Associated DNA) sequencing of sixty individuals representing two freshwater and one marine population led to the identification of 33,993 SNP markers. Ninety-six of these were chosen for the low-density SNP array, among which 70 represented SNPs under putatively directional selection in freshwater vs. marine environments, whereas 26 SNPs were assumed to be neutral. Annotation of these regions revealed several genes that are candidates for affecting stickleback phenotypic variation, some of which have been observed in previous studies whereas others are new.
CONCLUSIONS: We have developed a cost-efficient low-density SNP array that allows for rapid screening of polymorphisms in threespine stickleback. The array provides a valuable tool for analyzing adaptive divergence between freshwater and marine stickleback populations beyond the well-established candidate gene Ectodysplacin (EDA).
PMID: 25286752 [PubMed - as supplied by publisher]
The Development and Characterization of a 57K SNP Array for Rainbow Trout.
Mol Ecol Resour. 2014 Oct 8;
Authors: Palti Y, Gao G, Liu S, Kent MP, Lien S, Miller MR, Rexroad CE, Moen T
In this paper we describe the development and characterization of the first high density single nucleotide polymorphism (SNP) genotyping array for rainbow trout. The SNP array is publically available from a commercial vendor (Affymetrix). The SNP genotyping quality was high and validation rate was close to 90%. This is comparable to other farm animals and is much higher than previous smaller scale SNP validation studies in rainbow trout. High quality and integrity of the genotypes is evident from sample reproducibility and from nearly 100% agreement in genotyping results from other methods. The array is very useful for rainbow trout aquaculture populations with more than 40,900 polymorphic markers per population. For wild populations that were confounded by a smaller sample size the number of polymorphic markers was between 10,577 and 24,330. Comparison between genotypes from individual populations suggest good potential for identifying candidate markers for populations' traceability. Linkage analysis and mapping of the SNPs to the reference genome assembly provide strong evidence for a wide distribution throughout the genome with good representation in all 29 chromosomes. A total of 68% of the genome scaffolds and contigs were anchored through linkage analysis using the SNP array genotypes, including ~20% of the genome assembly that has not been previously anchored to chromosomes. This article is protected by copyright. All rights reserved.
PMID: 25294387 [PubMed - as supplied by publisher]