|
|
||||||||
Mini-Review |
1 Department of Laboratory Medicine, Clinical Center, National Institutes of Health, Bethesda, MD; 2 Plasma Proteome Institute, Washington, DC.
aAddress correspondence to this author at: Department of Laboratory Medicine, NIH, Building 10, Room 2C-407, 10 Center Dr., Bethesda, MD 20892-1508. Fax 301-402-1885; e-mail ghortin{at}mail.cc.nih.gov.
Abstract
Background: Plasma contains thousands of proteins, but a small number of these proteins comprise the majority of protein molecules and mass.
Content: We surveyed proteomic studies to identify candidates for high-abundance polypeptide chains. We searched the literature for information on the plasma concentrations of the most abundant components in healthy adults and for the molecular mass of the mature polypeptide chains in plasma. Because proteomic studies usually dissociate proteins into polypeptide chains or detect short peptide segments of proteins, we summarized data on individual peptide chains for proteins containing multiple subunits or polypeptides. We collected data on about 150 of the most abundant polypeptides in plasma. The abundant polypeptides span approximately the top 4 logs of concentration in plasma, from 650 to 0.06 µmol/L on a molar basis or from about 50 000 to 1 mg/L mass abundance.
Conclusions: Data on the concentrations of the high-abundance peptide chains in plasma assist in understanding the composition of plasma and potential approaches for clinical laboratory or proteomic analysis of plasma proteins. Development of more extensive databases regarding the plasma concentrations of proteins in health and diseases would promote diagnostic and proteomic advances.
the abundance of components of the plasma proteome
Recent applications of diverse proteomic methods have enabled the detection of >3000 different protein components in plasma, with high-confidence identification of about one-third of the components(1). The identification of individual components in plasma represents only a first stage in proteomic analysis, however. Quantitative analysis of components represents an important next step. Determination of the relative or absolute concentration of individual components in many plasma specimens usually represents a key step toward understanding the physiological significance or diagnostic potential of individual proteins. Quantitative analysis of the many components in the plasma proteome represents a major challenge, because the concentrations of different proteins extend over 12 orders of magnitude(2)(3). There are a small number of proteins with very high abundance and a gradually increasing number of proteins at lower abundance. A single protein, albumin, represents more than half of the protein mass, and the dozen most abundant proteins usually comprise >95% of the total protein mass. The small number of high-abundance components tends to dominate most forms of proteomic analysis and limit the ability to detect low-abundance components. Selective depletion of a dozen high-abundance proteins extends analyses down about 1–2 orders of magnitude on the abundance scale(4)(5)(6)(7)(8)(9)(10)(11). This approach has extended the depth of proteome analysis, although it presents challenges in unintended losses of minor components and issues of reproducibility(8)(9)(10)(11). The present survey of high-abundance proteins suggests that depletion of about 150 major polypeptides extends the range of analysis to about 10 000-fold lower concentrations. A number of the high-abundance polypeptides are components of multichain proteins, so the total number of proteins requiring depletion is smaller than the number of polypeptides.
Although many efforts at proteomic analysis have considered high-abundance proteins to be a hindrance in the search for more interesting minor components, the high-abundance proteins represent many physiologically important molecules, such as immunoglobulins, apolipoproteins, protease inhibitors, coagulation factors, complement factors, and carrier molecules(2)(12)(13)(14). Quantitative analysis of many of these high-abundance components is diagnostically useful for assessing nutrition, immune status, disorders of coagulation or fibrinolysis, disorders of lipoproteins, and acute-phase responses to injury or disease. Therefore, measurement of these molecules is an important part of clinical chemistry practice.
Because of their physiological and diagnostic significance as well as their impact on proteomic analysis, we sought to identify the concentrations of the most abundant plasma polypeptides and develop an approximate ranking of these components. This survey provides a ranking of the abundance of individual polypeptides on the basis of either molar concentration (Table 1
) or mass abundance (Supplementary Table 1, which accompanies the online version of this article at http://www.clinchem.org/content/vol54/issue10). Ranking of the concentrations of polypeptide chains rather than of intact proteins offers a more representative measure for quantitative mass spectrometric methods that analyze dissociated polypeptide chains or small peptide fragments of proteins(15)(16)(17)(18).
|
|
|
|
approach for identifying high-abundance plasma proteins
Decades of efforts at fractionation of plasma protein components and application of tools of protein chemistry identified more than 50 high-abundance protein components as summarized by Putnam in 1975(12) and Peters in 1983(13) and provided information about the structure of molecular forms in the circulation. We surveyed more recent studies using 2-dimensional electrophoresis for major components and bottom-up proteomic studies for frequently identified components(19)(20)(21)(22)(23)(24)(25)(26), and we examined studies that determined the concentrations of multiple plasma protein components(27)(28). We searched data on human plasma proteins in the Peptide Atlas (http://www.peptideatlas.org) of the Institute for Systems Biology for polypeptides, with >20 identifications of tryptic peptides. We conducted literature surveys using MedLine and Google to identify information on the concentration and structure of the circulating forms of plasma proteins, and we surveyed textbooks for summary information on coagulation and complement components. We also surveyed reference intervals for clinical assays of proteins to identify components of high abundance and the reference intervals for these components. Sources of data are summarized in Supplementary Table 2 in the online Data Supplement. Ranking of components was according to the mean of their concentration range for healthy adults, where a range was identified. In some cases, where the distribution of concentrations has unusual distributions in the population, ranking by the median concentration or geometric mean might be preferable, but was not done in the present survey, because many references do not provide detailed information about the median or distribution of values in the population.
We obtained structural information from sequence databases and references describing protein analyses to identify the mass of the molecular forms of polypeptides that occur in the circulation rather than a calculated mass that does not account for the posttranslational modifications of many of the proteins. Polypeptide masses determined by mass spectrometry are listed where this information was available(29)(30).
ranking of the most abundant polypeptides in plasma
An approximate ranking of the molar abundance of plasma polypeptides in healthy adults is provided in Table 1
. This ranking should be most applicable for analytical approaches that respond to the number of molecules. Examples of such techniques are most immunoassay and mass spectrometric methods, where the signal response is related to the number of molecules rather than the size of molecules(30). One approach for quantification of proteins that is seeing increased application is the analysis of tryptic peptides by liquid chromatography-triple quadrupole mass spectrometers with the use of stable isotope-labeled internal standards(15)(16)(17). That technique determines the absolute concentration of one or more short peptide segments of a protein.
Ranking of polypeptides according to mass abundance (Supplementary Table 1 in the online Data Supplement) is more suitable to analytical approaches such as electrophoretic analysis with staining of polypeptides or ultraviolet detection of components resolved by electrophoresis or chromatography. High-molecular-weight components move up in this form of ranking and are more strongly represented in approaches using detection related to mass abundance rather than methods responding to the number of molecules.
The most appropriate ranking of the molecular forms of polypeptides and whether by molar concentration or mass abundance depends on the state of proteins in the method of analysis and the method of detection. A previous, more limited survey(30) provided a ranking of the molar abundance of covalently-bound polypeptide complexes, which would differ from the present ranking for proteins containing multiple disulfide-linked chains, such as immunoglobulins. That ranking of proteins was directed at analysis of proteins by MALDI-TOF mass spectrometry, which does not dissociate disulfide-linked polypeptides. One needs to consider whether proteins are analyzed under denaturing or reducing conditions.
developing databases of polypeptide and protein concentrations
Two of the most important parameters in the analysis of plasma protein components are the sequences of individual components and their concentrations. Accurately assessing the concentration often represents the more challenging problem, as concentration may vary with many physiological, population, preanalytical, and analytical variables. Many years of experience in clinical chemistry laboratories show that establishing reference intervals for specific components can be a challenging task that can require widespread collaboration for standardization of analyses and the application of best available reference methods of analysis(31). Experience with the analysis of apolipoproteins provides good examples of the method-dependent variation in results and challenges in establishing reference intervals(32). For many of the entries in Table 1
, available studies offer concentration data on only a small number of specimens, physiological characteristics of subjects providing the specimens may be incompletely identified, and methods used for analysis may be imprecise or lacking well-characterized standards for calibration. There clearly is a need for more extensive reference range data for many of the abundant polypeptides.
The method of specimen collection can have significant effects on the composition of components. In particular, there is considerable change in the composition of plasma when it is allowed to clot and to form serum. Some of the coagulation components (such as fibrinogen chains) are removed, activation peptides are generated, and platelet components are released(30)(33)(34). Plasma specimens generally have been preferred for proteomic studies(34). Many physiological and pathological processes also lead to major changes in the concentrations of multiple plasma proteins. The acute-phase response is a well-characterized example of a process dramatically affecting the concentrations of many components(35). Thus, it becomes necessary to determine the changes in concentration of a protein for each physiological or pathological process of interest as well as the reference intervals for healthy subjects.
The issues noted above present challenges in developing a database of reference ranges for protein or polypeptide concentrations. This challenge must be addressed routinely by clinical laboratories or vendors of in vitro diagnostic tests and for special populations such as pediatric patients(36)(37)(38). To be of optimal utility, data on protein concentrations need to include information on the population, specimen types, analytical method, and distribution of results. Collection of more extensive data on the concentrations and structural variation of a wider range of plasma components would assist in providing a general frame of reference for plasma proteomic and diagnostic analysis.
The present effort to develop a listing of high-abundance polypeptides is admittedly a simplification. Some proteins undergo proteolytic processing that can yield internally cleaved or truncated forms, and there can be variation in posttranslational modification of proteins. Many of the posttranslational modifications of individual polypeptides are listed in Supplementary Table 2 in the online Data Supplement. Supplementary Table 2 also includes synonyms for protein names, links to sequence databases, and references for polypeptide abundance and mass. Some genes are duplicated, such as for
1-acid glycoprotein (gene 1 and 2) and the fourth component of complement (C4A and C4B). The duplicated gene products are highly homologous and are differentiated by a limited number of sequence substitutions. Only a few of the many tryptic peptides from these proteins distinguish different gene products. Some of the related polypeptides such as apolipoprotein B-48 and B-100 and haptoglobin
1- and
2-chains are more likely to be distinguished by top-down analysis of intact polypeptides than by bottom-up analysis of tryptic peptides. Immunoglobulin chains represent a set of millions of different sequences, with sequence variation of the variable domains and defined sequences for the constant domains. Therefore, the listed molar concentrations of immunoglobulin chains apply only to the constant domains and not to the complete polypeptide chains. From 2-dimensional electrophoresis, there is evidence for increased clonal expression of a few hundred light-chain molecules, which therefore achieve a concentration approximately in the 0.1–1 µmol/L range(2). It has not been determined whether there are similar subsets of heavy-chain clones with increased expression.
The present list offers representation of only a few examples of alternatively spliced forms of proteins. Fibronectin, for example, is known to occur in multiple spliced forms(39), but these were not broken out as separate entries for the present list. Advances in our understanding of the plasma proteome should allow progressive improvement in the delineation of major structural variants of proteins and their concentrations.
The list of high-abundance components in Table 1
and Supplementary Table 1 in the online Data Supplement should be viewed as a work in progress. As additional quantitative data about the abundance of plasma polypeptides becomes available, the ranking of some components will change and additional polypeptides within the high-abundance range will be added. The Peptide Atlas, which identifies tryptic peptides derived from thousands of polypeptides in human plasma, represents a database that might be further explored for candidates for high-abundance polypeptides. Our preliminary search of the Peptide Atlas identified a number of additional proteins with relatively frequent peptide identifications for which independent measures of protein concentration by immunoassay or other techniques were not found. Those results suggest that there will be further additions to the roster of high-abundance polypeptides.
the significance of high-abundance plasma polypeptides
The high-abundance polypeptides in Table 1
predominantly represent major secretory proteins released from abundant tissues such as liver, lymphoid and hematopoietic tissues, and intestines. A few other tissues are rarely represented, such as adipose tissue by adipsin and endothelial cells by von Willebrand factor. Intracellular proteins are represented by only a few examples such as hemoglobin chains. There is limited representation of products from other major tissues such as epidermis, bone, muscle, lung, pancreas, kidneys, and central nervous system, or of small specialized organs such as prostate, ovary, thyroid, and pituitary. Efforts such as the Human Protein Atlas(40) might help clarify the sources of high-abundance proteins and identify additional candidates from major organs that have limited representation in the current list of high-abundance proteins.
The high-abundance polypeptide components, although critical to systemic physiology, appear to represent primarily products of a few major tissues. Therefore, the search for polypeptides that can serve as diagnostic markers for disorders of other tissues or of small early-stage tumors may be justified in an interest in low-abundance components. The only changes detected among high-abundance components in such disorders of many tissues may be nonspecific responses to injury.
Acknowledgments
Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article.
Authors Disclosures of Potential Conflicts of Interest: Upon submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest:
Employment or Leadership: None declared.
Consultant or Advisory Role: None declared.
Stock Ownership: None declared.
Honoraria: None declared.
Research Funding: Glen L Hortin, NIH Clinical Center, Department of Health and Human Services; Leigh Anderson, National Cancer Institutes Clinical Proteomic Technology Assess for Cancer Program (grant U24-CA126476).
Expert Testimony: None declared.
Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.
References
2-glycoprotein (ZAG) by LC-MS/MS: a potential serum biomarker for prostate cancer. Clin Chem 2007;53:673-678.The following articles in journals at HighWire Press have cited this article:
![]() |
M. A. Kuzyk, D. Smith, J. Yang, T. J. Cross, A. M. Jackson, D. B. Hardie, N. L. Anderson, and C. H. Borchers Multiple Reaction Monitoring-based, Multiplexed, Absolute Quantitation of 45 Proteins in Human Plasma Mol. Cell. Proteomics, August 1, 2009; 8(8): 1860 - 1877. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |