|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Special Reports |
1 Center for Cardiovascular Disease Prevention, Brigham and Womens Hospital, Boston; 2 Donald W Reynolds Center for Cardiovascular Research, Harvard Medical School, Boston; 3 Amgen, Inc, Cambridge, MA.
aAddress correspondence to this author at: Center for Cardiovascular Disease Prevention, Brigham and Womens Hospital, 900 Commonwealth Ave East, Boston, MA 02215. Fax 617 734-1508; e-mail pridker{at}partners.org.
| Abstract |
|---|
| Introduction |
|---|
A favored analytic approach for such discovery is the genome-wide association study (GWAS)1 in which genetic variation across the human genome is compared between patients with different disease states or different risk-factor profiles. Success in GWAS requires a comprehensive knowledge of genome-wide variation and linkage disequilibrium patterns, the availability of dense genotyping chip sets containing several hundred thousand single-nucleotide polymorphisms (SNPs), and the availability of large, well-phenotyped patient populations (4)(5)(6)(7)(8)(9)(10). Appropriate patient populations can take the form of retrospective case-control studies (in which patients with and without existing disease are compared for genetic variation), prospective nested case-control studies (in which incident cases and matched controls who remain free of disease are selected from within an ongoing prospective cohort), or cross-sectional family-based studies (in which affected and unaffected parents and children are evaluated across generations).
A potentially more powerful approach to GWAS is the large-scale prospective cohort study, in which initially healthy individuals are followed over long time periods and assessed for the development and all members of the cohort undergo comprehensive genotyping. Such full-ascertainment prospective cohort studies have the advantage of avoiding bias in the selection of case and control subjects and enable simultaneous evaluation of a large number of environmental exposures and potential disease states in an epidemiologically efficient manner. Large-scale prospective cohort studies are also an optimal setting in which to evaluate gene-gene and gene-environment exposures likely to be of interest for complex disorders such as cardiovascular disease, stroke, diabetes, and cancer for which substantive environmental determinants are known. Unlike retrospective case-control or prospective nested case-control study designs, the full-cohort approach also allows the investigation of different diseases simultaneously, can easily include future cases in analyses without concern for ascertainment bias, reduces laboratory variability because the full cohort is genotyped at the same time and in random order, and markedly improves the ability to evaluate gene-environment interactions when environmental exposure is rare. The disadvantages of this approach are the greater expense of cohort assembly and baseline genotyping, as well as the need for comprehensive and ongoing long-term follow-up and endpoint ascertainment. Decade or longer follow-up periods are typically required in prospective cohort settings to allow for sufficient accrual of incident disease states.
The Womens Genome Health Study (WGHS) is an ongoing prospective cohort GWAS that derives from the NIH-funded Womens Health Study (WHS) and includes more than 25 000 initially healthy women who have already been followed for more than 12 years for the development of common disorders such as myocardial infarction, stroke, cancer, venous thromboembolism, diabetes, osteoporosis, cognitive decline, and common visual disorders such as age-related macular degeneration and cataracts. Because each WGHS participant is also a WHS participant, full epidemiologic data on a broad range of behavioral, dietary, and environmental risk exposures are available for the study population. In addition, each WGHS participant was included in the parent WHS and provided a baseline blood sample that was already evaluated for multiple disease biomarkers including total, HDL, and LDL cholesterol, triglycerides, apolipoprotein A-I, apolipoprotein B100, lipoprotein, homocysteine, high-sensitivity C-reactive protein (hsCRP), soluble intercellular adhesion molecule type-1 (sICAM-1), fibrinogen, creatinine, and hemoglobin A1c. Each baseline blood sample also had genomic DNA extracted and is now undergoing genotyping for more than 360 000 single-nucleotide polymorphisms (SNPs) using the HapMap-based Human-Hap300 Duo-plus BeadChip platform.
In this report we describe the WGHS and its parent WHS from the perspectives of cohort assembly, follow-up, endpoint validation, baseline plasma phenotyping, DNA extraction, genotyping, participant confidentiality, power and sample size and discuss the WGHS in context with other ongoing GWAS being performed in related areas.
| cohort assembly and prospective follow-up |
|---|
The WHS was initiated in 1992 to evaluate the balance of benefits and risks of low-dose aspirin and vitamin E in the primary prevention of cardiovascular disease and cancer in women (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14). Since its inception, the study has been continuously funded by the National Heart, Lung, and Blood Institute and the National Cancer Institute, with study agents provided by Bayer and the Natural Source Vitamin E Association.
Between September 1992 and May 1995, letters of invitation to participate in the WHS were sent to more than 1.7 million US female health professionals; 453 787 women completed the questionnaires, with 65 169 initially willing and eligible to enroll. Women were eligible if they were 45 years old or older; had no history of coronary heart disease, cerebrovascular disease, cancer (except nonmelanoma skin cancer), or other major chronic illness; had no history of side effects to study medications; were not taking aspirin or nonsteroidal antiinflammatory medications more than once a week; were not taking anticoagulants or corticosteroids; and were not taking individual supplements of vitamin A, E, or β-carotene more than once a week. Eligible women were enrolled in a 3-month run-in period of placebo administration to identify a group likely to be compliant with long-term treatment and follow-up. A total of 39 876 women were willing, eligible, and compliant during the run-in phase and were randomized in a 2 x 2 factorial design to 1 of 4 treatment groups: active aspirin (100 mg orally every other day) and vitamin E placebo, aspirin placebo and active vitamin E (600 IU orally every other day), both active agents, or both placebos.
Annually during the trial period, WHS participants were sent a 1-year supply of monthly calendar packs containing active agents or placebo, as well as questionnaires seeking information about compliance, side effects, the occurrence of relevant clinical endpoints, risk factors, and a comprehensive food frequency questionnaire. Study medications and endpoint ascertainment were continued in a blinded fashion through the scheduled end of the trial; randomized follow-up was completed in February 2005. At that time, rates of follow-up with respect to morbidity and mortality were 97.2% complete and 99.4% complete, respectively. The primary findings of the WHS with regard to the effects of aspirin and vitamin E on the primary trial endpoints of cardiovascular disease and cancer were presented in 2005 (12)(13)(14). Follow-up of the WHS cohort has continued without interruption since that time and is ongoing with 98% participation rates.
The WGHS cohort described here comprises 28 345 (70.6%) of the 39 876 WHS participants who provided a baseline blood sample adequate for plasma and DNA analysis before randomization and consented to ongoing analyses linking blood-derived observations with baseline risk factor profiles and incident disease events. The WHS trial and cohort follow-up, as well as the WGHS, were approved by the institutional review board of Brigham and Womens Hospital, Boston, MA, and monitored by an external data and safety monitoring board.
| endpoint validation |
|---|
WHO criteria are used to confirm the occurrence of myocardial infarction on the basis of symptoms and associated abnormal concentrations of cardiac enzymes or diagnostic electrocardiograms. Stroke is confirmed if the participant has a new neurologic deficit of sudden onset that persists for >24 h. Clinical information as well as computed tomographic scans or MRI are used to distinguish hemorrhagic from ischemic events. Cardiovascular deaths are confirmed by autopsy reports, death certificates, medical records, and information obtained from family members. Reports of coronary revascularization procedures (bypass surgery or percutaneous coronary angioplasty) are confirmed by record review. Transient ischemic attacks are confirmed if the neurologic deficit of sudden onset lasted for <24 h. The diagnosis of deep-vein thrombosis is confirmed by a positive venous ultrasonography or venography report, whereas the diagnosis of pulmonary embolism is confirmed by a positive angiogram or computed tomography scan of the chest, or a ventilation-perfusion scan with 2 or more mismatched defects. Deaths due to pulmonary embolism are confirmed when autopsy reports, symptoms, circumstances, and medical history are consistent with this diagnosis.
Cancers are confirmed on the basis of pathologic or cytology reports (96.8%) or, rarely, based on strong clinical and radiological or laboratory marker evidence (e.g., increased CA-125) when a pathology or cytology review was not conducted. All cancers are coded for site, type, and when available, metastatic spread.
Additional endpoints ascertained in the WHS include the occurrence of diabetes, incident hypertension, bone fracture, osteoporosis, cognitive decline, peripheral arterial disease, colonic polyps, and common visual disorders such as age-related macular degeneration and cataracts. The methods used for validation of these endpoints are described elsewhere (15)(16)(17)(18).
| baseline blood collection, processing, storage, and plasma phenotyping |
|---|
Funding from the Donald W. Reynolds Foundation (Las Vegas, NV) (19) has enabled biomarker analysis of each plasma sample in a core laboratory certified by the National Heart, Lung, and Blood Institute/CDC Lipid Standardization program. Concentrations of total cholesterol (TC) and HDL-C were measured enzymatically on a Hitachi 911 autoanalyzer (Roche Diagnostics) with day-to-day reproducibility of 1.36% and 1.07% for TC concentrations of 129.8 and 277.2 mg/dL, respectively, (throughout this report, concentrations and units given are those reported in the referenced sources) and of 1.98% and 2.68% for HDL-C concentrations of 35 and 55 mg/dL, respectively. LDL-C was determined directly (Genzyme) with reproducibility of 2.16% and 1.98% for concentrations of 76.2 and 148.7 mg/dL, respectively. Apolipoprotein-B100 and apolipoprotein-A-I were measured by an immunoturbidimetric technique, also on the Hitachi 911 analyzer. These assays employed the WHO/IFCC standards, and a validation study with those used at the Northwest Lipid Research Laboratory revealed a correlation coefficient of 0.98, intercept of 0.26 mg/dL, and slope of 0.97 for apoliporotein-B100, and correlation coefficient of 0.99, intercept of 0.264 mg/dL, and a slope of 1.0 for apoliporotein-A-I (20). Reproducibility was 3.68% and 2.95% for apolipoprotein-A-1 concentrations of 56.4 and 164.2 mg/dL, respectively, and 4.94% and 4.13% for apolipoprotein-B100 concentrations of 49.7 and 146.3 mg/dL, respectively. Triglycerides were measured enzymatically, with correction for endogenous glycerol, using a Hitachi 917 analyzer and reagents and calibrators from Roche Diagnostics; reproducibility was 1.52% and 1.49% for triglyceride concentrations of 82.5 and 178.8 mg/dL, respectively (21). In addition, full nuclear MR-based lipoprotein profiling is available on the full study cohort (LipoScience).
High-sensitivity C-reactive protein (hsCRP) was measured using a validated immunoturbidimetric method (Denka Seiken) with reproducibility of 2.16% and 3.34% for hsCRP concentrations of 1.94 and 11.42 mg/L, respectively (22). Lp(a) was evaluated with an apo(a)-independent assay with reproducibility of 2.47% and 1.45% for lipoprotein concentrations of 18.5 and 53.3 mg/dL, respectively (23). Homocysteine was determined enzymatically (Catch) with reproducibility of 4.72% and 3.06% at concentrations of 6.0 and 13.3 µmol/L, respectively and hemoglobin A1c was measured using turbidimetric immunoinhibition directly from packed red blood cells (Roche Diagnostics) with reproducibility of 3.63% and 3.77% at levels of 5.2% and 8.8%, respectively (24). Creatinine was measured by a rate-blanked method based on the Jaffe reaction using Roche Diagnostics reagents with reproducibility of 3.67% and 1.60% at concentrations of 1.17 and 6.40 mg/dL, respectively. Fibrinogen was measured by a mass-based immunoturbidimetric assay (DiaSorin) with reproducibility of 5.20% and 3.99% at concentrations of 99.1 and 273.7 mg/dL, respectively (25). Finally, sICAM was measured by quantitative sandwich ELISA (R&D Systems) with reproducibility of 8.89% and 6.39% at concentrations of 171.8 and 289.1 µg/L, respectively (26).
Of samples received by the core laboratory, 27 748 (98%) underwent successful evaluation for all biomarkers.
| dna extraction and genotyping procedures |
|---|
SNP genotyping of these DNA samples is performed using the Illumina Infinium II assay (27) to query a genome-wide set of 315 176 haplotype-tagging SNP markers (the Human HAP300 panel) (28). We added to this a focused panel of 45 882 missense and haplotype-tagging SNPs selected to enhance coverage of genomic regions in which we have a strong a priori interest owing to presence of genes believed to be of relevance to cancer as well as metabolic, cardiovascular, and inflammatory diseases (Human HAP300 Duo-plus). DNA samples are genotyped in batches of 95 WGHS participants with 1 CEPH (Centre dEtude de Polymorphism Humain) DNA (NA10846) included to monitor genotyping consistency and plate orientation. Genotyping reactions use 750 µg of genomic DNA where possible, although in some cases successful genotyping has been performed with as little as 45 µg of DNA. The Infinium II process was implemented using Illumina Infinium Robot Control software and monitored using the Illumina Infinium laboratory information management system. The hardware platform consists of 4 Tecan EVO liquid-handling robots, 8 hybridization ovens, 3 Illumina BeadStation confocal scanners, and dual-processor workstations with access to >1 TB of disk array storage to monitor workflow and generate high-quality reduced data. Genotype calls are generated and subjected to quality control using Illumina BeadStudio v3.1 software.
| participant confidentiality |
|---|
| statistical considerations and power for the wghs |
|---|
With regard to data processing and quality control, before performing any genetic analyses, all SNPs within the WGHS are evaluated for high call rates and the percentage of missing SNPs for each individual calculated. For SNPs with adequate data, Hardy-Weinberg disequilibria are evaluated to identify potential genotyping errors. We also compare the Illumina-based SNP data for each individual participant for a panel of approximately 70 common SNPs that have previously been ascertained in the WHS population using alternative genotyping technologies; this step is used as a secondary check to ensure accurate specimen labeling before any analyses. Finally, we use principal-component analysis to examine the data for any evidence of population stratification.
With regard to issues of multiple hypothesis testing, initial analyses within the WGHS will seek to define those relationships between individual SNPs, haplotypes, or genetic pathways that determine either incident clinical events or intermediate phenotypic traits. For all initial analyses, the WGHS will follow recent guidelines (29)(30) in which the P-value for genome-wide significance is predetermined to be at a level of 10–7 or smaller, a conservative approach consistent with Bonferroni correction. For this approach, sample size and power for the WGHS to reach the level of statistical significance are presented in Fig. 1
for clinical events such as myocardial infarction or diabetes and in Fig. 2
for continuous intermediate phenotypes such as HDL cholesterol. As shown, the large size of the WGHS provides more than adequate power at a genome-wide level of significance for clinical endpoints with 500 or more incident events, as well as extremely high power to detect genetic differences on an additive model for intermediate phenotypes for which the polymorphism of interest explains as little as 0.15% of the variance (see Fig. legends for detail). Compared to smaller GWAS already underway, the WGHS is well positioned to address gene-gene and gene-environment interactions across a wide range of clinical outcomes and environmental exposures.
|
|
| relationship of the wghs to other gwas |
|---|
| Acknowledgments |
|---|
Financial Disclosures: Alex Parker is an employee of Amgen, Inc.
Acknowledgments: The WGHS Investigators are indebted to Joseph P. Miletich for his foresight in understanding the role of genetics in common diseases affecting women, to the staff of the Womens Health Study, and to the dedicated and conscientious women who are participating in this study.
| Footnotes |
|---|
| References |
|---|
The following articles in journals at HighWire Press have cited this article:
![]() |
P. Libby, P. M. Ridker, G. K. Hansson, and for the Leducq Transatlantic Network on Atherothro Inflammation in Atherosclerosis From Pathophysiology to Practice. J. Am. Coll. Cardiol., December 1, 2009; 54(23): 2129 - 2138. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Dehghan, Q. Yang, A. Peters, S. Basu, J. C. Bis, A. R. Rudnicka, M. Kavousi, M.-H. Chen, J. Baumert, G. D.O. Lowe, et al. Association of Novel Genetic Loci With Circulating Fibrinogen Levels: A Genome-Wide Association Study in 6 Population-Based Cohorts Circ Cardiovasc Genet, April 1, 2009; 2(2): 125 - 133. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Pare, D. I. Chasman, A. N. Parker, R. R.Y. Zee, A. Malarstig, U. Seedorf, R. Collins, H. Watkins, A. Hamsten, J. P. Miletich, et al. Novel Associations of CPS1, MUT, NOX4, and DPEP1 With Plasma Homocysteine in a Healthy Population: A Genome-Wide Evaluation of 13 974 Participants in the Women's Genome Health Study Circ Cardiovasc Genet, April 1, 2009; 2(2): 142 - 150. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Danik, G. Pare, D. I. Chasman, R. Y.L. Zee, D. J. Kwiatkowski, A. Parker, J. P. Miletich, and P. M Ridker Novel Loci, Including Those Related to Crohn Disease, Psoriasis, and Inflammation, Identified in a Genome-Wide Association Study of Fibrinogen in 17 686 Women: The Women's Genome Health Study Circ Cardiovasc Genet, April 1, 2009; 2(2): 134 - 141. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Y.L. Zee, R. J. Glynn, S. Cheng, L. Steiner, L. Rose, and P. M Ridker An Evaluation of Candidate Genes of Inflammation and Thrombosis in Relation to the Risk of Venous Thromboembolism: The Women's Genome Health Study Circ Cardiovasc Genet, February 1, 2009; 2(1): 57 - 62. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. M. Ridker, G. Pare, A. N. Parker, R. Y.L. Zee, J. P. Miletich, and D. I. Chasman Polymorphism in the CETP Gene Region, HDL Cholesterol, and Risk of Future Myocardial Infarction: Genomewide Analysis Among 18 245 Initially Healthy Women From the Women's Genome Health Study Circ Cardiovasc Genet, February 1, 2009; 2(1): 26 - 33. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. P. Paynter, D. I. Chasman, J. E. Buring, D. Shiffman, N. R. Cook, and P. M. Ridker Cardiovascular Disease Risk Prediction With and Without Knowledge of Genetic Variation at Chromosome 9p21.3 Ann Intern Med, January 20, 2009; 150(2): 65 - 72. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. I. Chasman, G. Pare, and P. M Ridker Population-Based Genomewide Genetic Analysis of Common Clinical Chemistry Analytes Clin. Chem., January 1, 2009; 55(1): 39 - 51. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. A. Pearson and T. A. Manolio How to Interpret a Genome-wide Association Study JAMA, March 19, 2008; 299(11): 1335 - 1344. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |