Clinical Chemistry 55: 39-51, 2009.
First published November 21, 2008; 10.1373/clinchem.2008.107243
(Clinical Chemistry. 2009;55:39-51.)
© 2009 American Association for Clinical Chemistry, Inc.
Population-Based Genomewide Genetic Analysis of Common Clinical Chemistry Analytes
Daniel I. Chasman1,a,
Guillaume Paré1 and
Paul M Ridker1
1 Division of Preventive Medicine, Brigham and Womens Hospital, Boston, MA.
aAddress correspondence to this author at: Division of Preventive Medicine, Brigham and Womens Hospital, 900 Commonwealth Ave., East, Boston, MA 02215. Fax 617-232-3541; e-mail dchasman{at}rics.bwh.harvard.edu.
 |
Abstract
|
|---|
Background: Recent technologies enable genetic association studies of common clinical analytes on a genomewide basis in populations numbering thousands of individuals. The first publications using these technologies are already revealing novel biological functions for both genic and nongenic loci, and are promising to transform knowledge about the biological networks underlying disease pathophysiology. These early studies have also led to development of a set of principles for conducting a successful genomewide association study (GWAS).
Content: This review focuses on these principles with emphasis on the use of GWAS for plasma-based analytes to better understand human disease, with examples from cardiovascular biology.
Conclusions: The correlation of common genetic variation on a genomewide basis with clinical analytes, or any other outcome of interest, promises to reveal how parts of the genome work together in human physiology. Nonetheless, performing a genomewide association study demands an awareness of very specific epidemiologic and analytic principles.
 |
Introduction
|
|---|
Rare cases of heritable extreme plasma levels of analytes such as cholesterol have been essential for revealing the network of genes and proteins contributing to the risk of future disease risk in healthy individuals (1)(2)(3). For example, studies of individuals with homozygous familial hypercholesterolemia, who may have LDL cholesterol concentrations >300 mg/dL (7.77 mmol/L) as well as premature atherothrombosis, led to critical insights about the biology of LDL receptor and apolipoprotein B as well as therapeutic and even preventive strategies. Within the reference interval, however, more modest variation in plasma concentrations for a much larger collection of clinical analytes—including not only cholesterol measures, but also C-reactive protein (CRP),1
hemoglobin A1c, and creatinine—is also both substantially heritable and strongly correlated with the risk of future disease (4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)(31)(32). This observation underlies the possibility of genetic association studies of variation within the reference interval for a more comprehensive identification of biological processes involved in disease than may be feasible by analysis of rare cases of extreme variation.
For the past 3 years or so, it has been technically possible to perform genetic association studies on a genomewide scale for complex human traits, exemplified in this review by plasma-based analytes. What has emerged from publication of initial genomewide association studies (GWASs) are not only new functional relationships between complex traits and genetic loci in genic and nongenic regions of the genome, but also a set of principles for conducting a successful GWAS. This review focuses on these principles with emphasis on the use of GWASs for plasma-based analytes to better understand human disease, with examples from cardiovascular biology. Excellent reviews of GWAS techniques focused on dichotomous outcomes such as disease status can be found elsewhere (33)(34)(35)(36).
 |
Genomewide Genetic Variation in Human Populations
|
|---|
Almost always biallelic, single nucleotide polymorphisms (SNPs) are the most common form of genetic variation in the human genome, and are now known to number in excess of 12.8 million (dbSNP, build 128) (37). At the same time, it now appears that a much smaller number, perhaps approximately 2–3 million SNPs, are common, meaning they have a minor allele (i.e., less common allele) frequency of at least 5%. In populations with European or Asian ancestry, even fewer SNPs—perhaps 300 000 to 500 000—are necessary to sample the unique common genetic variation as a result of the correlation between SNPs or linkage disequilibrium (LD) (38)(39). SNPs chosen as surrogates for others on the basis of LD are termed tag SNPs. Recent estimates suggest that about 80% of common SNPs can be captured with 300 000 tag SNPs with a minimum correlation (r2) of 0.8 (40)(41). Consistent with the "out-of-Africa" hypothesis for migration of ancestral humans (42), identifying tag SNPs for African populations is more complex. Not only do African populations have a greater number of common SNPs, but also the variation is older than in European or Asian populations and thus has been exposed to more recombination. As a consequence, SNPs in African populations are less correlated (i.e., lower LD), and tagging common African SNP variation requires perhaps 500 000 to 1 000 000 SNPs (43)(44)(45). In addition, population diversity within Africa, both north and south of the Sahara as well as east to west and between isolated communities, makes the choice of an ideal set of tag SNPs quite challenging in African ancestral populations.
SNP choice aside, recent miniaturization technologies allow measurement of genotypes for as many as 1 000 000 or more tag SNPs across the whole genome for an individual in a single experiment without prohibitive cost. In general terms, these technologies are chip-based and use sequence-specific probes arrayed on the chip surface to interrogate prespecified SNPs through fluorescent readouts (46)(47). Genotypes for each SNP are determined by the intensity of fluorescent signals. In one implementation, the signal for homozygous genotypes of the 2 possible alleles are fluorescent signals of 2 alternative colors, for example red and green. The signal for heterozygous genotypes has both colors, each with half the intensity of the signal of either color alone from the homozygous genotypes.
 |
Study Samples
|
|---|
Although many study designs are possible, this review focuses on the population-based setting, for which success is critically dependent on application of sound epidemiological practices in assembling the study cohort (for examples of other designs, see (48)(49)(50)). As with conventional epidemiology, the essential principle is to minimize the risk of confounding due to factors that are simultaneously correlated with both the outcome of interest (e.g., a clinical analyte) and the exposure of interest, here the frequency of SNP alleles. Because of historical patterns of migration and isolation of human populations tens to hundreds of thousands of years ago, the frequency of SNP alleles will be closely related to population ancestry predating the relative mobility of recent centuries (42)(51). These same ancestral patterns may be accompanied by cultural characteristics, representing correlated exposures and potential sources of confounding. Although allele frequency does not vary across populations for most SNPs, alleles of many SNPs will be more prevalent in some populations than others (38)(40)(41)(43)(44)(45), and it is most reliable to strive toward ancestral homogeneity when ascertaining samples for population-based genetic association. At the very least, homogeneity should be ensured on the basis of self-report.
Eliminating all ancestral diversity is difficult (if not theoretically impossible) among humans, however, and analytic techniques may be used to address confounding due to the residual diversity (52)(53)(54)(55)(56). The basic approach relies on identifying a subset of SNPs whose allele or genotype frequencies are most variable across populations to provide quantitative estimates of ancestry for each individual in a study. These estimates can be used to account for potential confounding from the residual ancestral heterogeneity in statistical tests of association between SNPs and the outcome of interest, which may be a continuous clinical measure, e.g., a plasma-based clinical analyte or height, as follows. In a classic example, the analysis of height by GWAS among European-Americans found candidate variation in a gene encoding lactase activity (LCT)2
(57). The allele frequencies of LCT variants also vary across Europe, however, and after accounting for sub-European ancestry, the association between lactase genetic variation and height could no longer be demonstrated.
Even when the population is homogeneous, genetic analysis of clinical analytes may be confounded more directly by properties of the clinical assay. Use of several assay methods, each with different characteristics, in different subsets of the population may provide sources of inadvertent bias and confounding. Although not always feasible, the best way to avoid these issues is to use 1 set of specimen collection and storage procedures as well as 1 core laboratory for all plasma assays to be evaluated in a given GWAS. For example, in the Womens Genome Health Study (58), a single core laboratory standardized by the Centers for Disease Control and Prevention has been used to assay a wide range of lipid, inflammatory, and hemostatic biomarkers in a cohort of nearly 28 000 individuals. By doing so, intra- and interassay CVs due to assay characteristics are greatly minimized, reducing noise and increasing the likelihood of being able to observe genetic effects. Similarly, as all participants in this cohort had plasma collected, processed, stored, thawed, and assayed in an identical manner, variation in analyte concentrations resulting from sample processing is minimized, as is a concomitant potential for confounding of genetic signals.
More pernicious still may be direct interaction between the assay and genetic variation. The latter situation arises when the assay reagents are sensitive to the form of the analyte, as may happen with nonsynonymous SNPs in genes encoding protein analytes or when unlinked variation alters posttranslational modification of a protein analyte. For example, a widely used commercial assay for circulating plasma soluble intercellular adhesion molecule 1 (sICAM-1) is known not to recognize the K56M (rs5491) variant of ICAM-1 (59). Furthermore, this variant has a prevalence of about 35% in African ancestral populations but <1% European Ancestral populations. As a result, heterozygote carriers of this variant have, on average, half the mean sICAM-1 concentration of noncarriers when using this assay, whereas homozygous carriers have undetectable plasma sICAM-1. A genetic association study must account for this phenomenon (60) or risk falsely concluding that K56M is an extremely strong, population-dependent determinant of sICAM-1 levels.
 |
Data Quality
|
|---|
Data quality with current platforms for whole genome genotyping is quite good and provides unambiguous genotype information for the vast majority of SNPs. Thus genotype assignment for these SNPs may be assured as long as the source DNA and assay procedures are adequate, as evidenced by successful genotypes for most (say, 90%) of the SNPs on the chip. Still, as with measurement of the clinical analytes, it is important to genotype all samples with the same procedures and in the same laboratory to diminish the risk of unnecessary genotyping biases (61)(62). For a small number of SNPs, the raw data may still require more analysis or even visual inspection to discriminate between the 3 genotypes or to decide whether the genotyping assay is useful at all (63)(64)(65). To some extent, the problem of faulty SNP data is being resolved experimentally, as the SNP assays in each new iteration of the genotyping platforms are increasingly robust (66)(67). At the same time, recent improvements in algorithms for calling genotypes promise to improve treatment of problematic SNPs. Some recent data reduction strategies are conducted in a Bayesian framework and evaluate the probability of each of the 3 genotypes as a continuously valued rather than discrete quantity for use as prior information in hypothesis testing (68). This approach may be helpful for the small minority of SNPs with marginal raw data, but significant associations involving SNPs with fundamentally ambiguous genotype calls ought still to be replicated using alternative genotyping assays that can discriminate the 3 genotypes cleanly.
SNPs passing quality control in the data reduction stage may also be flagged as suspicious if they have genotype frequencies grossly deviating from the proportions expected in a sample from a randomly mating population. These proportions, termed Hardy-Weinberg equilibrium, derive simply from the binomial distribution as f12 for homozygotes of allele 1, 2 x f1 x f2 for the heterozygotes, and f22 for homozygotes of allele 2 when f1 and f2 are the population frequencies of the 2 alleles of a biallelic SNP. With the current genotyping platforms, these deviations often arise from inability to distinguish heterozygous genotypes from homozygous genotypes in the raw data. Given the hundreds of thousands of genotyped SNPs for each sample, it is customary to exclude SNPs on the basis of deviation from expected Hardy-Weinberg equilibrium only when the evidence is overwhelming (e.g., P < 10–6).
 |
Hypothesis Testing
|
|---|
In the population-based design, the usual mode of hypothesis testing for association between genetic variation and a clinical analyte borrows from standard regression techniques. Often (but not always) there is a linear (also termed additive) relationship between the number of copies of the minor allele (0, 1, or 2) of an associated SNP and the plasma concentration of an analyte. Hypothesis testing is thus equivalent to regressing the (possibly normalized) analyte concentration against the minor allele count and rejecting the null hypothesis dependent on the significance of the regression coefficient. When effects are linear, the estimate of the coefficient provides a measure of the per-allele genetic effect. Similarly, the estimate of the proportion of the variance explained by the genetic association in the linear model provides a measure of the total genetic effect, which will be closely related to statistical power (see below). At the same time, the regression formalism allows for many additional analyses, including adjustment for known influences of nongenetic covariates, adjustment for other genetic variants, model selection procedures, and interactions between SNPs and clinical covariates or other genetic variants. For example, to reduce the variance due to known, nongenetic factors and increase the relative effect of genetics, analyses of LDL cholesterol, CRP, or sICAM-1 may use residual values after adjustment for age, smoking status, blood pressure, obesity, and (among women) hormone replacement therapy use (60)(69)(70). Highly efficient software for performing the regression analysis on a genomewide scale is freely available (53).
In theory, the relationship between allele count and analyte concentration need not be linear. Instead, the mean plasma concentrations for the 3 genotype groups could be independent, including cases in which the mean value for heterozygotes is not intermediate between the mean values for the 2 homozygous genotypes. More commonly, threshold effects dependent on 1 or 2 copies of an allele may occur. When either 1 or 2 copies of the minor allele have the same influence on the analyte concentration, the effect is termed dominant; when 2 copies of the minor allele are necessary for a differential concentration, the effect is termed recessive. Some analysis plans therefore perform regression by encoding the 3 genotype groups separately and allowing an additional degree of freedom in the hypothesis testing compared with the more constrained linear model. Setting 1 genotype, usually the most common genotype, as a reference, significance in this formalism can be determined for each of the nonreference genotypes, again by regression procedures. As under the linear assumption, the additional conveniences of the regression formalism can also be exploited in models based on genotype. In practice, though, only the most unusual modes of association will experience a loss of power when imposing the linear assumption in hypothesis testing. As long as the mean value for the heterozygous genotype is intermediate between the values for the homozygous genotypes and the minor allele frequency is not too low, the additive model will be adequate (71).
The distribution of P values resulting from hypothesis testing of hundreds of thousands of SNPs in a GWAS can help evaluate study quality and also the likelihood of having discovered true associations. It is thus critically important to compare the distribution of P values from hypothesis testing to the uniform distribution expected under the null hypothesis, typically through the use of a quantile-quantile (QQ) plot. Deviations from the null throughout the distribution most often signify systematic inflation of the test statistic, usually due to heterogeneity in the sample related to ancestral admixture or, for clinical analytes, heterogeneity in the assay (Fig. 1A
). It can also result from systematic genotyping errors that eluded the initial quality controls. Deviation from the null among only the most significant SNPs (i.e., smallest P values), however, may be evidence for true associations. Because most GWAS test 300 000 to 1 000 000 SNPs per sample, perhaps tagging a total of 106 SNPs in non-Africans to 2 x 106 SNPs in Africans, these deviations will only be evident for SNPs with P values smaller than 10–5 to 10–7 (Fig. 1B
) (33)(35). As a corollary, the genomewide level of statistical significance among non-African populations after correction for the multiple hypothesis testing by the Bonferroni procedure will require an uncorrected P value <0.05/106 = 5 x 10–8. Once candidate genomewide SNPs have been identified, repeating the analysis after adjusting analyte concentrations for these candidates usually results in P values that are largely restored to the expectation under the null, unless the adjustment reveals new conditional relationships with genomewide significance (Fig. 1C
). As a complement to the genomewide significance standard of P < 5 x 10–8 (33), procedures for estimating the fraction of false associations as a function of P value thresholds, also known as the false discovery rate, can be used to identify candidate SNPs for true association with analyte concentration on a genomewide basis (72).

View larger version (11K):
[in this window]
[in a new window]
|
Figure 1. Distribution of P values in a genomewide scan with 317 000 SNPs with a hypothetical analyte in a white population of 5000 individuals.
Comparison of observed P values to the expected distribution under the null hypothesis in a QQ plot. (A), QQ plot of P values from inflated test statistic as may occur in the presence of population stratification, systematic genotyping errors, or systematic heterogeneity in the analyte assay. Deviation from the expectation under the null (dotted line) is observed throughout the distribution of P values. Dashed line indicates genomewide significance (P < 5 x 10–8). (B), QQ plot of P values lacking evidence of inflation showing deviation from the null expectation only among the most significant SNPs. (C), QQ plot of P values after adjustment of hypothetical analyte concentrations for SNPs reaching genomewide significance in (B), revealing an essentially null distribution.
|
|
The preceding remarks are not meant to exclude other modes of analysis. To the contrary, genetic analysis of genomewide data in large populations continues to evolve as an area of intense research. It is certainly possible that some associations will be revealed only by haplotypes (combinations of neighboring SNPs considered simultaneously) instead of individual SNPs (73)(74)(75)(76). Alternatively, some SNPs (or haplotypes) may act through effects on the variance of analyte concentration rather than on the mean. Further, some modes of association may be explained only by interaction with other clinical covariates, particularly when genotype-phenotype relationships are being sought to explain actual clinical events. As a classic example, the relationship between genetic variation in the alcohol dehydrogenase gene and incident myocardial infarction is, not surprisingly, mediated in part by the amount of alcohol consumption (77). When the interaction variable is time or age, longitudinal analysis may be indicated (78). The availability of plasma concentrations measured sequentially over a period of years in the same individual can aid greatly in this process. Finally, all of the analysis modes can be recast in terms of the Bayesian rather than the frequentist framework, especially as experience with GWAS continues to inform prior distributions for efficient effect estimates (79)(80)(81).
Neither the conventional analytic techniques nor the more intricate approaches in the preceding paragraph address the problem of assessing more modest yet true associations that may not meet genomewide significance. These relationships may not alter the bulk distribution of P values in a meaningful way (Fig. 1
) and will be statistically indistinguishable from the vast majority of SNPs with P values less than nominal significance due to chance alone in a GWAS. The number of these modest associations will depend on the outcome of interest, but the ability to detect increasing numbers of true associations with increasing sample size for some clinical outcomes suggests that SNPs with modest effects can be expected (68). To be sure, larger studies with tens or even hundreds of thousands of samples, either from single populations or accumulated through metaanalysis, will permit detection of modest associations (82)(83). Alternatively, identifying these modestly associated SNPs may be possible with analytic methods that introduce prior biological knowledge to focus attention on SNPs most likely to influence an outcome of interest (84)(85)(86).
 |
Power
|
|---|
Given the standard hypothesis testing formalism using linear regression and the additive assumption, it is instructive to explore the power for detecting genomewide associations with clinical analytes (or any other quantitative trait). The critical parameters in the power calculations are the sample size, which is often in the range of 1000 to 5000 individuals; the effect size, which can be summarized by the proportion of the variance explained by the 3 genotypes for a biallelic SNP; and finally the desired significance level, which is 5 x 10–8 for genomewide significance in non-African populations. Thus, at genomewide significance in a GWAS with 1000, 2000, or 5000 samples, the proportions of variance explained for 80% power are approximately 3.9%, 2.1%, or 0.8%, respectively (Fig. 2A
). Under the additive assumption, the proportion of variance explained may be deconstructed into the combination of the per-allele shift in the (normalized) mean concentration of the clinical analyte and the minor allele frequency of the SNP. For example, in a study with 2000 samples, there would be 80% power for detecting shifts of about 0.25 SD or 9 mg/dL for the case of LDL cholesterol (SD is approximately 36 mg/dL in whites) for SNPs with minor allele frequency greater than about 0.2 (Fig. 2A
). However, equivalent power for rarer minor alleles requires dramatically larger per-allele shifts in means. Below 5% minor allele frequency, the per-allele shift required for genomewide significance becomes very large, and this trend underlies limiting population GWASs to associations with minor allele frequency in the range of 5%–50%. However, the threshold of 5% remains somewhat arbitrary, and some SNPs with minor allele in the range of 1%–5% may nonetheless have large enough effects on plasma analytes for adequate power with reasonable sample sizes. For example, a recently described SNP in PCSK9 (proprotein convertase subtilisin/kexin type 9) has minor allele frequency 1.6% and decreases LDL cholesterol by almost 0.5 SD while definitively decreasing cardiovascular risk (87)(88).

View larger version (31K):
[in this window]
[in a new window]
|
Figure 2. Power for detecting genetic effects on plasma concentration of a clinical analyte assuming a linear mode for the association and the hypothesis testing.
(A), 80% power for studies with 1000, 2000, and 5000 samples parameterized by minor allele frequency and the standardized, per-allele shift in the mean analyte concentration. Thin contours indicate the proportion of variance explained by each combination of minor allele frequency and per-allele shift in mean analyte level. (B), Distribution of P values in a study with 2000 samples dependent on the proportion of variance explained under the alternative hypothesis. (C), Decrease in power to detect genetic effects on interindividual variation in a model that also considers the contribution of intraindividual variation and measurement error to overall variance in analyte concentration. Fr. var. error, fraction of variance due to error.
|
|
Still, underpowered but true associations with lesser effects may be identified at genomewide significance in the discovery phase of a GWAS, and it is equally instructive to examine the distribution of P values arising from fluctuations due to sampling in a population-based study. The distributions can be quite broad, so that even genetic effects having considerably <80% power will still reach genomewide significance an appreciable part of the time. For example, with 2000 samples, effects explaining 1.5% of the variance will reach genomewide significance 52% of the time, although an effect explaining 2.1% of the variance is required for 80% power at genomewide significance. Effects explaining 1.0% of the variance will reach genomewide significance 16% of the time. These statistical fluctuations can also result in unwarranted pessimism or optimism about associations, since P values considerably larger than 10–6 or smaller than 10–14 will occur about 20% of the time, even when the genetic effects have nominal 80% power for genomewide significance (Fig. 2B
).
For a clinical analyte, power to detect genetic effects on interindividual variation will also be influenced by intraindividual variation and variation in the assay determination. These last 2 quantities are related to measurement error, and as they increase, the proportion of the variance due to interindividual variation will decrease. Thus, power to detect genetic effects on interindividual variation will decrease as well. For a GWAS with 1000–5000 samples, there is as much as a 15%–20% loss of power associated when up to 20% of the variance is attributable to the combination of intraindividual variation and measurement error (Fig. 2C
). For example, the CV for triglycerides can be as much as 30%, mostly due to intraindividual variability presumably related to fasting status (89). With a population mean of about 140 mg/dL for triglycerides, this total measurement error will contribute as much as about 20% to the total variance and reduce power in a GWAS with 2000 samples as much as 20% for samples explaining 1.5%–2.0% of the interindividual variance (Fig. 2C
). For these reasons, GWASs targeting triglyceride concentrations may wish strict control on fasting status and other influences on intraindividual variation.
 |
Confirmatory Studies: Replication and Validation
|
|---|
Whether adequately powered or not, SNP associations meeting genomewide significance should, in general, be confirmed by replication studies for 2 purposes (90). First, if performed with an alternative genotyping technology, replication reduces the possibility of apparent associations arising from systematic correlation between spurious genotypes and the clinical analyte. Second, when performed in a separately ascertained population, replication greatly reinforces the degree to which findings can be generalized. Following the discussion above, confirmation in a population with different clinical characteristics, environmental exposures, or, most dramatically, alternative ancestry may be particularly relevant for confirming the association. Recent examples of highly replicating associations for clinical analytes include data at the SORT1 (sortilin 1) locus for influences on LDL cholesterol and plasma apolipoprotein B (apoB), or the GCKR [glucokinase (hexokinase 4) regulator] locus for influences on triglycerides and the inflammatory marker CRP (32)(70)(91)(92)(93)(94)(95). Highly confirmed loci for disease status have been identified for diabetes, aortic aneurysm, and cardiovascular events at 9p21.3 (32)(68)(82)(96)(97)(98)(99)(100)(101), for type 2 diabetes alone at several loci (32)(82)(102), for type 1 diabetes (103), inflammatory bowel disease (104)(105), macular degeneration at the CFH (complement factor H) and other loci (106)(107)(108), atrial fibrillation at 4q25 (109), and both vascular events and statin response with KIF6 (kinesin family member 6) (110)(111)(112).
It is important to recognize that failure to replicate in an ancestrally different population need not invalidate a candidate association. It would not be surprising if differing LD relationships in ancestrally different populations would diminish the ability of a tag SNP chosen from one sample to adequately capture the causal SNP in a second sample. LD persistence in African ancestral populations, for example, averages half as many bases as in non-African populations (43)(45). In comparing populations with divergent ancestry, it may be sensible to replicate by examining population-specific tag SNPs in a candidate region. As discussed above, ancestry may be related strongly enough to allele frequency to account for a failure to replicate. For example, factor V Leiden, one of the most important genetic determinants of venous thromboembolism, occurs in 5% to 8% of most European populations, where it is associated with a high attributable risk. However, replication of this effect would be difficult if not impossible in an Asian population, where factor V Leiden is rare (and likely as a consequence, the incidence of venous thrombosis is also far less frequent) (113). Finally, the underlying epidemiology of the disease of interest may be as important as ancestral impact on allele frequency, and it is also not unreasonable to anticipate that different SNPs might have different associations by sex or according to different environmental exposures that may be ubiquitous in one geographic region but almost absent in another. It may not be possible to validate genetic variation that impacts the renin-angiotensin system in geographic zones, for example, with high salt intake in regions with low salt consumption or in populations with a low frequency of salt-sensitive hypertension (114).
Recent results have addressed the value of internal replication, in which the main study population is divided into separate discovery and validation samples with essentially the same ascertainment (115). In the discovery sample, SNPs are identified at one significance standard in the discovery sample, say genomewide significance. Then, these SNPs are validated in the second sample at a significance standard that is typically lower but adequate for nominal significance (P < 0.05) after Bonferroni correction for the small number of SNPs carried forward from discovery. However, it can be demonstrated, perhaps counterintuitively, that the 2-stage procedure has less statistical power than analysis in the whole sample at once, leaving validation best performed in separately ascertained, external samples. These same analytic findings suggest increasing use of metaanalysis in evaluating associations in several populations, rather than assigning somewhat arbitrary precedence to each population in a staged analysis plan (e.g., (82)). Often, the relevant parameters from separate populations for metaanalysis may be derived from the literature or by collaboration. As more GWASs are performed and data become publicly available, metaanalyses without the need for additional experimentation becomes increasingly feasible (116)(117).
 |
Interpretation
|
|---|
Once initial discovery is complete and replication studies have validated the candidate associations, it is common to ask whether the identified SNPs have a causal role in the association or whether they are simply in LD with a causal variant. If a candidate SNP appears to be functional (for example because it encodes an amino acid substitution or disrupts sequences with known or inferred biological function), a strong hypothesis may already exist for pursuing confirmatory experimental functional studies. Otherwise, it may be necessary to look for stronger associations at the candidate locus by genotyping additional variants chosen strategically from dbSNP or the HapMap, 2 publicly available repositories of SNP information. For some additional SNPs, genotypes may be estimated by purely computational methods through LD relationships in the HapMap (38)(53)(81)(118). A more thorough job of finding the best candidates for biological function will often involve resequencing a candidate region to find SNPs that either were not included or were overlooked in the original analysis, often owing to low minor allele frequency (119). This approach to follow-up has increasing appeal as high-throughput sequencing techniques continue to be more affordable.
The causal variant need not be a single SNP at all, however, but instead may be a combination of SNPs defining a haplotype or even copy number variants involving long-range perturbations in sequence. Whereas analytic methods exist for examining both of these possibilities, deciphering the precise causal relationships between complex patterns of variation and clinical analyte concentration will rarely be simple. Often, an unambiguous basis for causality may be deduced only by combining the genetic findings with experimental analysis in vitro or in vivo.
Detailed nature of the associated variation aside, one of the great hopes of GWAS technology is the identification of new functional relationships for both known genes and relatively unexplored parts of the genome. To be sure, the first series of published GWASs of plasma-based intermediate phenotypes have fulfilled this promise, and are furthering an understanding the networks of interconnected proteins that contribute to the concentration level of a clinical analyte (60)(69)(70)(92)(93)(94)(120). For example, 1 recent GWAS of plasma CRP concentration (69), a marker of inflammation as well as vascular risk and diabetes, identified proteins previously recognized for roles in metabolic processes, including GCKR, LEPR (leptin receptor), HNF1A (HNF1 homeobox A), APOE (apolipoprotein E), CRP (C-reactive protein, pentraxin-related), and IL6R (interleukin-6 receptor) (Fig. 3A
). Thus, through the connection to plasma CRP concentrations, these data provide crucial genetic evidence for a biological pathway unifying inflammation and metabolic processes, and suggesting further links between diabetes and vascular disease. Alternatively, rather than suggesting new connections between known genes, some associations may suggest new functions for loci with the hallmarks of protein-encoding properties, but no known biological activity, and thus begin to place these genelike regions in a biological context. Finally, associations at loci lacking conventional genic structure begin to ascribe function to unannotated regions of the genome, as was suggested again by the genomewide association between a gene-free region of 12q23.2 and plasma CRP concentrations. Certainly, biological function for this region may be suspected from its high level of sequence conservation among vertebrates (Fig. 3B
).

View larger version (20K):
[in this window]
[in a new window]
|
Figure 3. Genetic associations across the genome for plasma concentration of CRP in whites.
(A), The strongest candidate genes at loci with genomewide level of significance include LEPR, IL6R, CRP, GCKR, HNF1A, and APOE. Variation in a gene desert region of 12q23.2 also achieves genomewide significance (dotted horizontal line) [after (69)]. (B), Known transcripts and extreme evolutionary conservation among vertebrates within about 250 kb of the genomewide associations for plasma CRP in a gene desert region of 12q23.2.
|
|
Deeper understanding of biological function and disease processes may also be revealed by exploring whether the associations of candidate SNPs with plasma analytes can be extended to associations with a suitable clinical outcome (32)(94). As with the initial GWAS, cohort design is crucial for the success of these studies, and must be carefully scrutinized to avoid confounding. Although effects on disease risk are expected to be modest, for example with relative risk estimates <2, the threshold for significance in these studies will be less stringent than in the initial GWAS, since only a few variants will likely be pursued, presenting only minimal burden of multiple hypothesis testing. Of course, discovering genes related to disease via analysis of clinical analytes raises the possibility of identifying new drug targets and therapeutic strategies.
Because the effects on disease of individual candidate variants arising from genetic studies of plasma analytes are expected to be small, some investigators have begun testing multiple candidates simultaneously, i.e., in aggregate. In one approach, test of association is performed by including a set of candidate SNPs as separate terms in a single (logistic) regression model and evaluating the total contribution of the genetic terms to risk. In another approach, one assumes that the effects of each SNP on disease status will be roughly equivalent, allowing the construction of a genetic risk score (GRS) as the count of risk alleles carried by an individual (121)(122)(123). Including the GRS as the independent variable in hypothesis testing may yield only an average estimate of risk, but it may also reinforce a pathophysiologic link between a set of genes and a clinical analyte and disease that could not be established with any single candidate variant alone. In a recent example, 9 SNPs, selected for their effects on plasma lipid fractions, were used to construct a GRS associated with cardiovascular risk (95).
Using genetics to establish true causality between a clinical analyte and disease is problematic, however. The idea, termed Mendelian randomization, is based on the assumption of a random assortment of alleles during mating, much the way drug treatment is randomly allocated in a clinical trial (124). If the concentration of a clinical analyte is associated with a SNP, it is argued, then a potential causal role of the analyte in disease can be tested by identifying an association between the SNP and disease directly. Although intuitively appealing and potentially informative, the idea is quite controversial for several reasons (125)(126). First, there is considerable uncertainty in the relationship between the effects on clinical analyte concentrations in cross-sectional designs typical of GWASs and their effects on a longitudinal basis over a lifetime. Cross-sectional associations simply may not reflect time-varying levels, for example postprandial levels or levels in childhood and adulthood. Further, the impact of a clinical analyte on disease may itself have a longitudinal component that is not captured in cross-sectional analysis. For example, increased analyte concentrations may confer lower risk at some ages but higher risk at others, especially if the effects are modified by a second exposure from the environment or other genes. Second, the statistical power to detect associations of common variants on disease is often limited even when the effect of the same variants on a clinical analyte is strong enough for good power. For example, Fig. 2A
shows adequate power in a GWAS with a few thousand individuals for effects explaining only 1%–2% in the variance in a clinical analyte, a very small amount of variation for detecting association with disease in a comparably sized sample. As a result of these limitations, many studies invoking Mendelian randomization are simply underpowered and unlikely to be informative about causal relationships.
Most importantly, given the interconnected nature of biochemical pathways, it is extremely unlikely for a genetic variant to affect levels of a clinical analyte without also simultaneously affecting other analytes and molecular processes, some of which may also influence disease. For example, a recently described variant in GCKR increases plasma concentrations of triglycerides, apoB, and plasma CRP, all risk factors for cardiovascular disease and diabetes (32)(69)(70). This same variant, however, increases plasma concentrations of ApoA1 and decreases fasting glucose, trends that are protective of cardiovascular disease and diabetes (70)(127). In this case, the relationship between any of these analytes alone and cardiovascular disease would be hard to discern based on the genetic data alone, even if adequate statistical power were available. Similarly, in the case of the GRS-constructed SNPs with effects on lipid fractions (see above), the association with risk persisted after adjustment of the lipid fractions themselves, suggesting additional effects of the genetic variation not captured by the cross-sectional measure of lipid concentration (95). Given the caveats, one must conclude that a positive result in a Mendelian randomization analysis may suggest a causal link between an analyte and disease, but the absence of an effect is very difficult to interpret.
 |
Conclusion
|
|---|
Recent technologies allow comprehensive survey of common variation in the human genome and its testing for association with plasma concentrations of clinical analytes. The first published reports using these technologies are discovering new functions for the genome and new hypotheses for understanding human biology and disease. If the human genome sequence has enabled technologies for GWAS, then the correlation of common genetic variation on a genomewide basis with clinical analytes, or any other outcome of interest, begins to reveal how the parts of the genome work together in human physiology.
 |
Acknowledgments
|
|---|
Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article.
Authors Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest:
Employment or Leadership: None declared.
Consultant or Advisory Role: P.M Ridker, Astra-Zeneca, Novartis, Merck Schering Plough, Sanofi-Aventis, Siemens, and ISIS.
Stock Ownership: None declared.
Honoraria: None declared.
Research Funding: D.I. Chasman, National Heart, Lung, and Blood Institute and Donald W. Reynolds Foundation; G Paré, Fonds de la Recherche en Santé du Quebec; P.M Ridker, National Heart, Lung, and Blood Institute; Amgen, Inc.; and Donald W. Reynolds Foundation.
Expert Testimony: None declared.
Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.
 |
Footnotes
|
|---|
1 Nonstandard abbreviations: CRP, C-reactive protein; GWAS, genomewide association study; SNP, single nucleotide polymorphism; LD, linkage disequilibrium; sICAM-1, soluble intercellular adhesion molecule 1; QQ, quantile-quantile; apoB, apolipoprotein B; GRS, genetic risk score. 
2 Human genes: LCT, lactase; PCSK9, proprotein convertase subtilisin/kexin type 9; SORT1, sortilin 1; GCKR, glucokinase (hexokinase 4) regulator; CFH, complement factor H; KIF6, kinesin family member 6; LEPR, leptin receptor; HNF1A, HNF1 homeobox A; APOE, apolipoprotein E; CRP, C-reactive protein, pentraxin-related; IL6R, interleukin-6 receptor. 
 |
References
|
|---|
- Brown MS, Goldstein JL. A receptor-mediated pathway for cholesterol homeostasis. Science (Wash DC) 1986;232:34-47.[Free Full Text]
- Hobbs HH, Brown MS, Goldstein JL. Molecular genetics of the LDL receptor gene in familial hypercholesterolemia. Hum Mutat 1992;1:445-466.[CrossRef][Medline]
[Order article via Infotrieve]
- Rader DJ, Cohen J, Hobbs HH. Monogenic hypercholesterolemia: new insights in pathogenesis and treatment. J Clin Invest 2003;111:1795-1803.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Perrone RD, Madias NE, Levey AS. Serum creatinine as an index of renal function: new insights into old concepts. Clin Chem 1992;38:1933-1953.[Abstract]
- Ridker PM, Rifai N, Cook NR, Bradwin G, Buring JE. Non-HDL cholesterol, apolipoproteins A-I and B100, standard lipid measures, lipid ratios, and CRP as risk factors for cardiovascular disease in women. JAMA 2005;294:326-333.[Abstract/Free Full Text]
- Pradhan AD, Rifai N, Buring JE, Ridker PM. Hemoglobin A1c predicts diabetes but not cardiovascular disease in nondiabetic women. Am J Med 2007;120:720-727.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Ridker PM, Buring JE, Rifai N, Cook NR. Development and validation of improved algorithms for the assessment of global cardiovascular risk in women: the Reynolds Risk Score. JAMA 2007;297:611-619.[Abstract/Free Full Text]
- Ingelsson E, Schaefer EJ, Contois JH, McNamara JR, Sullivan L, Keyes MJ, et al. Clinical utility of different lipid measures for prediction of coronary heart disease in men and women. JAMA 2007;298:776-785.[Abstract/Free Full Text]
- Ingelsson E, Pencina MJ, Tofler GH, Benjamin EJ, Lanier KJ, Jacques PF, et al. Multimarker approach to evaluate the incidence of the metabolic syndrome and longitudinal changes in metabolic risk factors: the Framingham Offspring Study. Circulation 2007;116:984-992.[Abstract/Free Full Text]
- Zethelius B, Berglund L, Sundstrom J, Ingelsson E, Basu S, Larsson A, et al. Use of multiple biomarkers to improve the prediction of death from cardiovascular causes. N Engl J Med 2008;358:2107-2116.[Abstract/Free Full Text]
- Bansal S, Buring JE, Rifai N, Mora S, Sacks FM, Ridker PM. Fasting compared with nonfasting triglycerides and risk of cardiovascular events in women. JAMA 2007;298:309-316.[Abstract/Free Full Text]
- Bathum L, Petersen I, Christiansen L, Konieczna A, Sorensen TI, Kyvik KO. Genetic and environmental influences on plasma homocysteine: results from a Danish twin study. Clin Chem 2007;53:971-979.[Abstract/Free Full Text]
- Benjamin EJ, Dupuis J, Larson MG, Lunetta KL, Booth SL, Govindaraju DR, et al. Genome-wide association with select biomarker traits in the Framingham Heart Study. BMC Med Genet 2007;8 Suppl 1:S11.
- Bielinski SJ, Pankow JS, Foster CL, Miller MB, Hopkins PN, Eckfeldt JH, et al. Circulating soluble ICAM-1 levels shows linkage to ICAM gene cluster region on chromosome 19: the NHLBI Family Heart Study follow-up examination. Atherosclerosis 2008;199:172-178.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Bladbjerg EM, de Maat MP, Christensen K, Bathum L, Jespersen J, Hjelmborg J. Genetic influence on thrombotic risk markers in the elderly: a Danish twin study. J Thromb Haemost 2006;4:599-607.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Boerwinkle E, Leffert CC, Lin J, Lackner C, Chiesa G, Hobbs HH. Apolipoprotein(a) gene accounts for greater than 90% of the variation in plasma lipoprotein(a) concentrations. J Clin Invest 1992;90:52-60.[Web of Science][Medline]
[Order article via Infotrieve]
- Cohen RM, Snieder H, Lindsell CJ, Beyan H, Hawa MI, Blinko S, et al. Evidence for independent heritability of the glycation gap (glycosylation gap) fraction of HbA1c in nondiabetic twins. Diabetes Care 2006;29:1739-1743.[Abstract/Free Full Text]
- de Maat MP, Bladbjerg EM, Hjelmborg JB, Bathum L, Jespersen J, Christensen K. Genetic influence on inflammation variables in the elderly. Arterioscler Thromb Vasc Biol 2004;24:2168-2173.[Abstract/Free Full Text]
- Ding K, Feng D, de Andrade M, Mosley TH, Jr, Turner ST, Boerwinkle E, Kullo IJ. Genomic regions that influence plasma levels of inflammatory markers in hypertensive sibships. J Hum Hypertens 2008;22:102-110.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Dupuis J, Larson MG, Vasan RS, Massaro JM, Wilson PWF, Lipinska I, et al. Genome scan of systemic biomarkers of vascular inflammation in the Framingham Heart Study: evidence for susceptibility loci on 1q. Atherosclerosis 2005;182:307-314.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Fenger M, Schousboe K, Sorensen TI, Kyvik KO. Variance decomposition of apolipoproteins and lipids in Danish twins. Atherosclerosis 2007;191:40-47.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Fox CS, Yang Q, Cupples LA, Guo CY, Larson MG, Leip EP, et al. Genomewide linkage analysis to serum creatinine, GFR, and creatinine clearance in a community-based population: the Framingham Heart Study. J Am Soc Nephrol 2004;15:2457-2461.[Abstract/Free Full Text]
- Heller DA, de Faire U, Pedersen NL, Dahlen G, McClearn GE. Genetic and environmental influences on serum lipid levels in twins. N Engl J Med 1993;328:1150-1156.[Abstract/Free Full Text]
- Kathiresan S, Manning AK, Demissie S, D'Agostino RB, Surti A, Guiducci C, et al. A genome-wide association study for blood lipid phenotypes in the Framingham Heart Study. BMC Med Genet 2007;8(Suppl 1):S17.[CrossRef][Medline]
[Order article via Infotrieve]
- Lamon-Fava S, Jimenez D, Christian JC, Fabsitz RR, Reed T, Carmelli D, et al. The NHLBI Twin Study: heritability of apolipoprotein A-I, B, and low density lipoprotein subclasses and concordance for lipoprotein (a). Atherosclerosis 1991;91:97-106.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Pankow JS, Folsom AR, Cushman M, Borecki IB, Hopkins PN, Eckfeldt JH, Tracy RP. Familial and genetic determinants of systemic markers of inflammation: the NHLBI family heart study. Atherosclerosis 2001;154:681-689.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Pilia G, Chen WM, Scuteri A, Orru M, Albai G, Dei M, et al. Heritability of cardiovascular and personality traits in 6,148 Sardinians. PLoS Genet 2006;2:e132.[CrossRef][Medline]
[Order article via Infotrieve]
- Siva A, De Lange M, Clayton D, Monteith S, Spector T, Brown MJ. The heritability of plasma homocysteine, and the influence of genetic variation in the homocysteine methylation pathway. QJM 2007;100:495-499.[Abstract/Free Full Text]
- Snieder H, Sawtell PA, Ross L, Walker J, Spector TD, Leslie RD. HbA(1c) levels are genetically determined even in type 1 diabetes: evidence from healthy and diabetic twins. Diabetes 2001;50:2858-2863.[Abstract/Free Full Text]
- Vermeulen SH, van der Vleuten GM, de Graaf J, Hermus AR, Blom HJ, Stalenhoef AF, den Heijer M. A genome-wide linkage scan for homocysteine levels suggests three regions of interest. J Thromb Haemost 2006;4:1303-1307.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Weiss LA, Pan L, Abney M, Ober C. The sex-specific genetic architecture of quantitative traits in humans. Nat Genet 2006;38:218-222.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Saxena R, Voight BF, Lyssenko V, Burtt NP, de Bakker PI, Chen H, et al. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science (Wash DC) 2007;316:1331-1336.[Abstract/Free Full Text]
- Hirschhorn JN, Daly MJ. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 2005;6:95-108.[Web of Science][Medline]
[Order article via Infotrieve]
- Pearson TA, Manolio TA. How to interpret a genome-wide association study. JAMA 2008;299:1335-1344.[Abstract/Free Full Text]
- McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JP, Hirschhorn JN. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 2008;9:356-369.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Wang WY, Barratt BJ, Clayton DG, Todd JA. Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 2005;6:109-118.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 2001;29:308-311.[Abstract/Free Full Text]
- Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature (Lond) 2007;449:851-861.[CrossRef][Medline]
[Order article via Infotrieve]
- Wall JD, Pritchard JK. Haplotype blocks and linkage disequilibrium in the human genome. Nat Rev Genet 2003;4:587-597.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Bhangale TR, Rieder MJ, Nickerson DA. Estimating coverage and power for genetic association studies using near-complete variation data. Nat Genet 2008;40:841-843.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- de Bakker PI, Burtt NP, Graham RR, Guiducci C, Yelensky R, Drake JA, et al. Transferability of tag SNPs in genetic association studies in multiple populations. Nat Genet 2006;38:1298-1303.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Cavalli-Sforza LL. Human evolution and its relevance for genetic epidemiology. Annu Rev Genomics Hum Genet 2007;8:1-15.[CrossRef][Medline]
[Order article via Infotrieve]
- Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, et al. The structure of haplotype blocks in the human genome. Science (Wash DC) 2002;296:2225-2229.[Abstract/Free Full Text]
- Stephens JC, Schneider JA, Tanguay DA, Choi J, Acharya T, Stanley SE, et al. Haplotype variation and linkage disequilibrium in 313 human genes. Science (Wash DC) 2001;293:489-493.[Abstract/Free Full Text]
- Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, et al. Whole-genome patterns of common DNA variation in three human populations. Science (Wash DC) 2005;307:1072-1079.[Abstract/Free Full Text]
- Kennedy GC, Matsuzaki H, Dong S, Liu WM, Huang J, Liu G, et al. Large-scale genotyping of complex DNA. Nat Biotechnol 2003;21:1233-1237.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Gunderson KL, Steemers FJ, Lee G, Mendoza LG, Chee MS. A genome-wide scalable SNP genotyping assay using microarray technology. Nat Genet 2005;37:549-554.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Weinberg CR, Shore DL, Umbach DM, Sandler DP. Using risk-based sampling to enrich cohorts for endpoints, genes, and exposures. Am J Epidemiol 2007;166:447-455.[Abstract/Free Full Text]
- Patterson N, Hattangadi N, Lane B, Lohmueller KE, Hafler DA, Oksenberg JR, et al. Methods for high-density admixture mapping of disease genes. Am J Hum Genet 2004;74:979-1000.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Laird NM, Lange C. Family-based designs in the age of large-scale gene-association studies. Nat Rev Genet 2006;7:385-394.[Web of Science][Medline]
[Order article via Infotrieve]
- Cavalli-Sforza LL. The Human Genome Diversity Project: past, present and future. Nat Rev Genet 2005;6:333-340.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Devlin B, Roeder K, Bacanu SA. Unbiased methods for population-based association studies. Genet Epidemiol 2001;21:273-284.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007;81:559-575.[CrossRef][Medline]
[Order article via Infotrieve]
- Ardlie KG, Lunetta KL, Seielstad M. Testing for population subdivision and association in four case-control studies. Am J Hum Genet 2002;71:304-311.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Hinds DA, Stokowski RP, Patil N, Konvicka K, Kershenobich D, Cox DR, Ballinger DG. Matching strategies for genetic association studies in structured populations. Am J Hum Genet 2004;74:317-325.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet 2006;2:e190.[CrossRef][Medline]
[Order article via Infotrieve]
- Campbell CD, Ogburn EL, Lunetta KL, Lyon HN, Freedman ML, Groop LC, et al. Demonstrating stratification in a European American population. Nat Genet 2005;37:868-872.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Ridker PM, Chasman DI, Zee RY, Parker A, Rose L, Cook NR, Buring JE. Rationale, design, and methodology of the Womens Genome Health Study: a genome-wide association study of more than 25,000 initially healthy American women. Clin Chem 2008;54:249-255.[Abstract/Free Full Text]
- Register TC, Burdon KP, Lenchik L, Bowden DW, Hawkins GA, Nicklas BJ, et al. Variability of serum soluble intercellular adhesion molecule-1 measurements attributable to a common polymorphism. Clin Chem 2004;50:2185-2187.[Free Full Text]
- Pare G, Chasman DI, Kellogg M, Zee RY, Rifai N, Badola S, et al. Novel association of ABO histo-blood group antigen with soluble ICAM-1: results of a genome-wide association study of 6,578 women. PLoS Genet 2008;4:e1000118.[CrossRef][Medline]
[Order article via Infotrieve]
- Plagnol V, Cooper JD, Todd JA, Clayton DG. A method to address differential bias in genotyping in large-scale association studies. PLoS Genet 2007;3:e74.[CrossRef][Medline]
[Order article via Infotrieve]
- Clayton DG, Walker NM, Smyth DJ, Pask R, Cooper JD, Maier LM, et al. Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat Genet 2005;37:1243-1246.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Rabbee N, Speed TP. A genotype calling algorithm for Affymetrix SNP arrays. Bioinformatics 2006;22:7-12.[Abstract/Free Full Text]
- Lin S, Carvalho B, Cutler DJ, Arking DE, Chakravarti A, Irizarry RA. Validation and extension of an empirical Bayes method for SNP calling on Affymetrix microarrays. Genome Biol 2008;9:R63.[CrossRef][Medline]
[Order article via Infotrieve]
- Scharpf RB, Ting JC, Pevsner J, Ruczinski I. SNPchip: R classes and methods for SNP array data. Bioinformatics 2007;23:627-628.[Abstract/Free Full Text]
- Illumina. Infinium HD DNA analysis BeadChips data sheet..
- Affymetrix. Affymetrix Genome-wide Human SNP Array 6.0 data sheet..
- Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature (Lond) 2007;447:661-678.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Ridker PM, Pare G, Parker A, Zee RY, Danik JS, Buring JE, et al. Loci related to metabolic-syndrome pathways including LEPR, HNF1A, IL6R, and GCKR associate with plasma C-reactive protein: the Womens Genome Health Study. Am J Hum Genet 2008;82:1185-1192.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Chasman DI, Pare G, Zee RY, Parker AN, Cook NR, Buring JE, et al. Genetic loci associated with plasma concentration of low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, triglycerides, apolipoprotein A1, and apolipoprotein B among 6382 white women in genome-wide analysis with replication. Circ Cardiovasc Genet 2008;1:21-30.[Abstract/Free Full Text]
- Lettre G, Lange C, Hirschhorn JN. Genetic model testing and statistical power in population-based association studies of quantitative traits. Genet Epidemiol 2007;31:358-362.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci U S A 2003;100:9440-9445.[Abstract/Free Full Text]
- Conrad DF, Jakobsson M, Coop G, Wen X, Wall JD, Rosenberg NA, Pritchard JK. A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nat Genet 2006;38:1251-1260.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Sabeti P, Usen S, Farhadian S, Jallow M, Doherty T, Newport M, et al. CD40L association with protection from severe malaria. Genes Immun 2002;3:286-291.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, et al. Genome-wide detection and characterization of positive selection in human populations. Nature (Lond) 2007;449:913-918.[CrossRef][Medline]
[Order article via Infotrieve]
- Tishkoff SA, Reed FA, Ranciaro A, Voight BF, Babbitt CC, Silverman JS, et al. Convergent adaptation of human lactase persistence in Africa and Europe. Nat Genet 2007;39:31-40.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Hines LM, Stampfer MJ, Ma J, Gaziano JM, Ridker PM, Hankinson SE, et al. Genetic variation in alcohol dehydrogenase and the beneficial effect of moderate alcohol consumption on myocardial infarction. N Engl J Med 2001;344:549-555.[Abstract/Free Full Text]
- Lasky-Su J, Lyon HN, Emilsson V, Heid IM, Molony C, Raby BA, et al. On the replication of genetic associations: timing can be everything!. Am J Hum Genet 2008;82:849-858.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Hoggart CJ, Whittaker JC, De Iorio M, Balding DJ. Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet 2008;4:e1000130.[CrossRef][Medline]
[Order article via Infotrieve]
- Lunn DJ, Whittaker JC, Best N. A Bayesian toolkit for genetic association studies. Genet Epidemiol 2006;30:231-247.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Servin B, Stephens M. Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet 2007;3:e114.[CrossRef][Medline]
[Order article via Infotrieve]
- Zeggini E, Scott LJ, Saxena R, Voight BF, Marchini JL, Hu T, et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet 2008;40:638-645.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Lettre G, Jackson AU, Gieger C, Schumacher FR, Berndt SI, Sanna S, et al. Identification of ten loci associated with height highlights new biological pathways in human growth. Nat Genet 2008;40:584-591.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Thomas DC. The need for a systematic approach to complex pathways in molecular epidemiology. Cancer Epidemiol Biomarkers Prev 2005;14:557-559.[Free Full Text]
- Roeder K, Devlin B, Wasserman L. Improving power in genome-wide association studies: weights tip the scale. Genet Epidemiol 2007;31:741-747.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Chasman DI. On the utility of gene set methods in genomewide association studies of quantitative traits. Genet Epidemiol 2008;32:658-668.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Cohen JC, Boerwinkle E, Mosley TH, Jr, Hobbs HH. Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. N Engl J Med 2006;354:1264-1272.[Abstract/Free Full Text]
- Scartezini M, Hubbart C, Whittall RA, Cooper JA, Neil AH, Humphries SE. The PCSK9 gene R46L variant is associated with lower plasma lipid levels and cardiovascular risk in healthy U.K. men. Clin Sci (Lond) 2007;113:435-441.[Medline]
[Order article via Infotrieve]
- Smith SJ, Cooper GR, Myers GL, Sampson EJ. Biological variability in concentrations of serum lipids: sources of variation among results from published studies and composite predicted values. Clin Chem 1993;39:1012-1022.[Abstract/Free Full Text]
- Chanock SJ, Manolio T, Boehnke M, Boerwinkle E, Hunter DJ, Thomas G, et al. Replicating genotype-phenotype associations. Nature (Lond) 2007;447:655-660.[CrossRef][Medline]
[Order article via Infotrieve]
- Chasman DI, Kozlowski P, Zee RY, Kwiatkowski DJ, Ridker PM. Qualitative and quantitative effects of APOE genetic variation on plasma C-reactive protein, LDL-cholesterol, and apoE protein. Genes Immun 2006;7:211-219.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Reiner AP, Barber MJ, Guan Y, Ridker PM, Lange LA, Chasman DI, et al. Polymorphisms of the HNF1A gene encoding hepatocyte nuclear factor-1 alpha are associated with C-reactive protein. Am J Hum Genet 2008;82:1193-1201.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Sandhu MS, Waterworth DM, Debenham SL, Wheeler E, Papadakis K, Zhao JH, et al. LDL-cholesterol concentrations: a genome-wide association study. Lancet 2008;371:483-491.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Willer CJ, Sanna S, Jackson AU, Scuteri A, Bonnycastle LL, Clarke R, et al. Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nat Genet 2008;40:161-169.[Medline]
[Order article via Infotrieve]
- Kathiresan S, Melander O, Anevski D, Guiducci C, Burtt NP, Roos C, et al. Polymorphisms associated with cholesterol and risk of cardiovascular events. N Engl J Med 2008;358:1240-1249.[Abstract/Free Full Text]
- Helgadottir A, Thorleifsson G, Magnusson KP, Gretarsdottir S, Steinthorsdottir V, Manolescu A, et al. The same sequence variant on 9p21 associates with myocardial infarction, abdominal aortic aneurysm and intracranial aneurysm. Nat Genet 2008;40:217-224.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Samani NJ, Erdmann J, Hall AS, Hengstenberg C, Mangino M, Mayer B, et al. Genomewide association analysis of coronary artery disease. N Engl J Med 2007;357:443-453.[Abstract/Free Full Text]
- McPherson R, Pertsemlidis A, Kavaslar N, Stewart A, Roberts R, Cox DR, et al. A common allele on chromosome 9 associated with coronary heart disease. Science (Wash DC) 2007;316:1488-1491.[Abstract/Free Full Text]
- Steinthorsdottir V, Thorleifsson G, Reynisdottir I, Benediktsson R, Jonsdottir T, Walters GB, et al. A variant in CDKAL1 influences insulin response and risk of type 2 diabetes. Nat Genet 2007;39:770-775.[CrossRef][Medline]
[Order article via Infotrieve]
- Gudmundsson J, Sulem P, Steinthorsdottir V, Bergthorsson JT, Thorleifsson G, Manolescu A, et al. Two variants on chromosome 17 confer prostate cancer risk, and the one in TCF2 protects against type 2 diabetes. Nat Genet 2007;39:977-983.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Cauchi S, Meyre D, Durand E, Proenca C, Marre M, Hadjadj S, et al. Post genome-wide association studies of novel genes associated with type 2 diabetes show gene-gene interaction and high predictive value. PLoS ONE 2008;3:e2031.[CrossRef][Medline]
[Order article via Infotrieve]
- Sladek R, Rocheleau G, Rung J, Dina C, Shen L, Serre D, et al. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature (Lond) 2007;445:881-885.[CrossRef][Medline]
[Order article via Infotrieve]
- Todd JA, Walker NM, Cooper JD, Smyth DJ, Downes K, Plagnol V, et al. Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes. Nat Genet 2007;39:857-864.[CrossRef][Medline]
[Order article via Infotrieve]
- Franke A, Balschun T, Karlsen TH, Hedderich J, May S, Lu T, et al. Replication of signals from recent studies of Crohns disease identifies previously unknown disease loci for ulcerative colitis. Nat Genet 2008;40:713-715.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Barrett JC, Hansoul S, Nicolae DL, Cho JH, Duerr RH, Rioux JD, et al. Genome-wide association defines more than 30 distinct susceptibility loci for Crohns disease. Nat Genet 2008;40:955-962.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Swaroop A, Branham KE, Chen W, Abecasis G. Genetic susceptibility to age-related macular degeneration: a paradigm for dissecting complex disease traits. Hum Mol Genet 2007;16(Spec 2):R174-R182.[Abstract/Free Full Text]
- Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, et al. Complement factor H polymorphism in age-related macular degeneration. Science (Wash DC) 2005;308:385-389.[Abstract/Free Full Text]
- Hageman GS, Anderson DH, Johnson LV, Hancox LS, Taiber AJ, Hardisty LI, et al. A common haplotype in the complement regulatory gene factor H (HF1/CFH) predisposes individuals to age-related macular degeneration. Proc Natl Acad Sci U S A 2005;102:7227-7232.[Abstract/Free Full Text]
- Gudbjartsson DF, Arnar DO, Helgadottir A, Gretarsdottir S, Holm H, Sigurdsson A, et al. Variants conferring risk of atrial fibrillation on chromosome 4q25. Nature (Lond) 2007;448:353-357.[CrossRef][Medline]
[Order article via Infotrieve]
- Shiffman D, Chasman DI, Zee RY, Iakoubova OA, Louie JZ, Devlin JJ, Ridker PM. A kinesin family member 6 variant is associated with coronary heart disease in the Womens Health Study. J Am Coll Cardiol 2008;51:444-448.[Abstract/Free Full Text]
- Iakoubova OA, Sabatine MS, Rowland CM, Tong CH, Catanese JJ, Ranade K, et al. Polymorphism in KIF6 gene and benefit from statins after acute coronary syndromes: results from the PROVE IT-TIMI 22 study. J Am Coll Cardiol 2008;51:449-455.[Abstract/Free Full Text]
- Iakoubova O, Shepherd J, Sacks F. Association of the 719Arg variant of KIF6 with both increased risk of coronary events and with greater response to statin therapy. J Am Coll Cardiol 2008;51:2195author reply 2195–6.[Free Full Text]
- Ridker PM, Miletich JP, Hennekens CH, Buring JE. Ethnic distribution of factor V Leiden in 4047 men and women: implications for venous thromboembolism screening. JAMA 1997;277:1305-1307.[Abstract/Free Full Text]
- Strazzullo P, Galletti F. Genetics of salt-sensitive hypertension. Curr Hypertens Rep 2007;9:25-32.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Skol AD, Scott LJ, Abecasis GR, Boehnke M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet 2006;38:209-213.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Mailman MD, Feolo M, Jin Y, Kimura M, Tryka K, Bagoutdinov R, et al. The NCBI dbGaP database of genotypes and phenotypes. Nat Genet 2007;39:1181-1186.[Web of Science][Medline]
[Order article via Infotrieve]
- Manolio TA, Rodriguez LL, Brooks L, Abecasis G, Ballinger D, Daly M, et al. New models of collaboration in genome-wide association studies: the Genetic Association Information Network. Nat Genet 2007;39:1045-1051.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Li Y, Abecasis GR. Mach 1.0: Rapid haplotype reconstruction and missing genotype inference. Am J Hum Genet 2006;S79:2290.
- Blow N. Genomics: the personal side of genomics. Nature (Lond) 2007;449:627-630.[Web of Science][Medline]
[Order article via Infotrieve]
- Kooner JS, Chambers JC, Aguilar-Salinas CA, Hinds DA, Hyde CL, Warnes GR, et al. Genome-wide scan identifies variation in MLXIPL associated with plasma triglycerides. Nat Genet 2008;40:149-151.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Morrison AC, Bare LA, Chambless LE, Ellis SG, Malloy M, Kane JP, et al. Prediction of coronary heart disease risk using a genetic risk score: the Atherosclerosis Risk in Communities Study. Am J Epidemiol 2007;166:28-35.[Abstract/Free Full Text]
- Bare LA, Morrison AC, Rowland CM, Shiffman D, Luke MM, Iakoubova OA, et al. Five common gene variants identify elevated genetic risk for coronary heart disease. Genet Med 2007;9:682-689.[Web of Science][Medline]
[Order article via Infotrieve]
- Drenos F, Whittaker JC, Humphries SE. The use of meta-analysis risk estimates for candidate genes in combination to predict coronary heart disease risk. Ann Hum Genet 2007;71:611-619.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Smith GD, Ebrahim S. Mendelian randomization: prospects, potentials, and limitations. Int J Epidemiol 2004;33:30-42.[Free Full Text]
- Nitsch D, Molokhia M, Smeeth L, DeStavola BL, Whittaker JC, Leon DA. Limits to causal inference based on Mendelian randomization: a comparison with randomized controlled trials. Am J Epidemiol 2006;163:397-403.[Abstract/Free Full Text]
- Glynn RJ. Commentary: genes as instruments for evaluation of markers and causes. Int J Epidemiol 2006;35:932-934.[Free Full Text]
- Orho-Melander M, Melander O, Guiducci C, Perez-Martinez P, Corella D, Roos C, et al. Common missense variant in the glucokinase regulatory protein gene (GCKR) is associated with increased plasma triglyceride and C-reactive protein but lower fasting glucose concentrations. Diabetes 2008;57:3112-3121.[Abstract/Free Full Text]