BRCA1 Polymorphisms and Breast Cancer Epidemiology in the Western New York Exposures and Breast Cancer (WEB) Study.
Genet Epidemiol. 2013 May 14;
Authors: Ricks-Santi LJ, Nie J, Marian C, Ochs-Balcom HM, Trevisan M, Edge SB, Freudenheim JL, Shields PG
Results of studies for the association of BRCA1 genotypes and haplotypes with sporadic breast cancer have been inconsistent. Therefore, a candidate single nucleotide polymorphism (SNP) approach was used in a breast cancer case-control study to explore genotypes and haplotypes that have the potential to affect protein functions or levels. In a breast cancer case-control study, genotyping of BRCA1 polymorphisms Q356R, D693N, and E1038G was performed on 1,005 cases and 1,765 controls. Unconditional, polytomous logistic regression and χ(2) -tests were used to examine the associations of breast cancer with genotypes and haplotypes. In addition, interactions between genotype and smoking, benign breast disease, family history of breast cancer, body mass index (BMI), alcohol consumption, and hormonal risk factors, hormone receptor status, and breast cancer pathology were calculated also using logistic regression and χ(2) . Although sporadic breast cancer was not associated with BRCA1 genotypes or haplotypes overall or by menopausal status, there was evidence of an interaction between the E1038G BRCA1 genotype, smoking, and BMI among premenopausal women (P for interaction = 0.01 and 0.045, respectively) and between E1038G and D693N BRCA1 genotypes and hormone therapy use among postmenopausal women (P for interaction = 0.01 and 0.02, respectively). There were no other associations found between BRCA1 genotypes and stage, histological grade, or nuclear grade. However, the D693N SNP was associated with the risk of triple negative breast cancer (odds ratio = 2.31 95% confidence interval 1.08-4.93). The BRCA1 variants studied may play a role in the etiology of triple negative breast cancer and may interact with environmental factors such as hormone therapy or smoking and increase sporadic breast cancer risk.
PMID: 23674270 [PubMed - as supplied by publisher]
Joint Genotype Calling With Array and Sequence Data.
Genet Epidemiol. 2012 Jul 20;
Authors: O'Connell J, Marchini J
Analysis of rare variants is currently a major focus of genetic studies of human disease. Single-nucleotide polymorphism (SNP) genotypes can be assayed using microarray genotyping or by sequencing, but neither technology produces perfect genotype calls, especially at rare SNPs. Studies that collect both types of data are becoming increasingly common, so it may be possible to combine data types to increase accuracy. We present a method, called Chiamante, which calls genotypes on individuals with either array data, sequence data, or both. The model adapts to data quality and can estimate when either the array or the sequence data should be ignored when calling the genotypes at each SNP. As a special case, our method will call genotypes from only array data and outperforms existing methods in this scenario. We have applied our method to array and sequence data from Phase I of the 1000 Genomes Project and show that it provides improved performance, especially at rare SNPs. This method provides a foundation for future efforts to fuse genetic data from different sources, for example, when combining data from exome sequencing and exome microarrays.
PMID: 22821426 [PubMed - as supplied by publisher]
An integrative segmentation method for detecting germline copy number variations in SNP arrays.
Genet Epidemiol. 2012 May;36(4):373-83
Authors: Shi J, Li P
Germline copy number variations (CNVs) are a major source of genetic variation in humans. In large-scale studies of complex diseases, CNVs are usually detected from data generated by single nucleotide polymorphism (SNP) genotyping arrays. In this paper, we develop an integrative segmentation method, SegCNV, for detecting CNVs integrating both log R ratio (LRR) and B allele frequency (BAF). Based on simulation studies, SegCNV had modestly better power to detect deletions and substantially better power to detect duplications compared with circular binary segmentation (CBS) that relies purely on LRRs; and it had better power to detect deletions and a comparable performance to detect duplications compared with PennCNV and QuantiSNP. In two Hapmap subjects with deep sequence data available as a gold standard, SegCNV detected more true short deletions than PennCNV and QuantiSNP. For 21 short duplications validated experimentally in the AGRE dataset, SegCNV, QuantiSNP, and PennCNV detected all of them while CBS detected only three. SegCNV is much faster than the HMM-based (where HMM is hidden Markov model) methods, taking only several seconds to analyze genome-wide data for one subject. Genet. Epidemiol. 36:373-383, 2012. © 2012 Wiley Periodicals, Inc.
PMID: 22539397 [PubMed - in process]
Pitfalls of merging GWAS data: lessons learned in the eMERGE network and quality control procedures to maintain high data quality.
Genet Epidemiol. 2011 Dec;35(8):887-98
Authors: Zuvich RL, Armstrong LL, Bielinski SJ, Bradford Y, Carlson CS, Crawford DC, Crenshaw AT, de Andrade M, Doheny KF, Haines JL, Hayes MG, Jarvik GP, Jiang L, Kullo IJ, Li R, Ling H, Manolio TA, Matsumoto ME, McCarty CA, McDavid AN, Mirel DB, Olson LM, Paschall JE, Pugh EW, Rasmussen LV, Rasmussen-Torvik LJ, Turner SD, Wilke RA, Ritchie MD
Genome-wide association studies (GWAS) are a useful approach in the study of the genetic components of complex phenotypes. Aside from large cohorts, GWAS have generally been limited to the study of one or a few diseases or traits. The emergence of biobanks linked to electronic medical records (EMRs) allows the efficient reuse of genetic data to yield meaningful genotype-phenotype associations for multiple phenotypes or traits. Phase I of the electronic MEdical Records and GEnomics (eMERGE-I) Network is a National Human Genome Research Institute-supported consortium composed of five sites to perform various genetic association studies using DNA repositories and EMR systems. Each eMERGE site has developed EMR-based algorithms to comprise a core set of 14 phenotypes for extraction of study samples from each site's DNA repository. Each eMERGE site selected samples for a specific phenotype, and these samples were genotyped at either the Broad Institute or at the Center for Inherited Disease Research using the Illumina Infinium BeadChip technology. In all, approximately 17,000 samples from across the five sites were genotyped. A unified quality control (QC) pipeline was developed by the eMERGE Genomics Working Group and used to ensure thorough cleaning of the data. This process includes examination of sample and marker quality and various batch effects. Upon completion of the genotyping and QC analyses for each site's primary study, eMERGE Coordinating Center merged the datasets from all five sites. This larger merged dataset reentered the established eMERGE QC pipeline. Based on lessons learned during the process, additional analyses and QC checkpoints were added to the pipeline to ensure proper merging. Here, we explore the challenges associated with combining datasets from different genotyping centers and describe the expansion to eMERGE QC pipeline for merged datasets. These additional steps will be useful as the eMERGE project expands to include additional sites in eMERGE-II, and also serve as a starting point for investigators merging multiple genotype datasets accessible through the National Center for Biotechnology Information in the database of Genotypes and Phenotypes. Our experience demonstrates that merging multiple datasets after additional QC can be an efficient use of genotype data despite new challenges that appear in the process. Genet. Epidemiol. 35:887-898, 2011. © 2011 Wiley Periodicals, Inc.
PMID: 22125226 [PubMed - in process]