|
|
||||||||
Evidence-Based Laboratory Medicine and Test Utilization |
1 Assistance Publique-Hôpitaux de Paris Groupe Hospitalier Pitié-Salpêtrière, Paris, France.
2 Laboratoire Alphabio, Marseille, France.
3 Service dHépato-gastroentérologie, Hôpital Haut Lévêque, Pessac, France.
4 Biopredictive, Paris, France.
5 Service dHépatogastroentérologie, Hôpital Saint-Joseph, Marseille, France.
aAddress correspondence to this author at: Assistance Publique-Hôpitaux Groupe Hospitalier Pitié-Salpêtrière, 47-83 Boulevard de lHôspital, 75651 Paris Cedex 13, France. Fax 33-142161427; e-mail tpoynard{at}teaser.fr.
| Abstract |
|---|
|
|
|---|
Methods: The AUCs of FibroTest (FT) for the diagnosis of advanced fibrosis were estimated in patients with chronic hepatitis C using an integrated database including 1312 patients with FT and biopsy, and in an overview of 18 diagnostic studies.
Results: In the integrated database considering stage prevalence, the FT AUC for advanced fibrosis varied (P <0.001) from 0.67 (only stage F2 as advanced fibrosis and only F1 as nonadvanced fibrosis) to 0.98 (only F4 as advanced fibrosis and only F0 as nonadvanced fibrosis). The same results were observed in the overview, in which the FT AUC varied (P <0.001) from 0.65 to 0.89 according to fibrosis stage prevalence. Two approaches for expressing standardized AUCs were developed: one approach assumed a uniform prevalence distribution of each fibrosis stage; the other approach used the prevalence distribution of fibrosis stages observed in the population.
Conclusions: The expressions of the AUCs of fibrosis markers should be standardized according to the prevalence of fibrosis stages defining advanced and nonadvanced fibrosis.
| Background |
|---|
|
|
|---|
These new markers quantitatively estimate liver fibrosis, and their diagnostic accuracy is evaluated using the area under the ROC curve (AUC) (2)(4). The AUC combines the sensitivity and specificity of a given quantitative marker for the diagnosis of a specific definition of fibrosis. Sensitivity is usually assessed in patients with advanced fibrosis (i.e., stages F2, F3, and F4 in the METAVIR scoring system) (4)(5) and specificity in nonadvanced fibrosis (i.e., stages F0, F1).
Since 1991, we have identified parameters associated with fibrosis and fibrosis progression and constructed several panels combining these parameters, including the PGA index (6), age/platelets index (7), and the FibroTest (FT) (8). We and others have observed that 2 factors are associated with the AUCs: biopsy length and biopsy fragmentation (4)(9)(10)(11)(12)(13)(14).
No study has examined the importance of the spectrum of stages defining advanced or nonadvanced fibrosis. In the last 25 years, 35 reviews and 25 editorials of noninvasive fibrosis markers have been identified (see Appendix in the Data Supplement that accompanies the online version of this article at http://www.clinchem.org/content/vol53/issue9), with 19 of them published in 2005 and 2006. Among these 19 recent publications, all discussed the sensitivity, specificity, or AUC of noninvasive markers for the diagnosis of advanced fibrosis, but none of these studies took into account the prevalence of fibrosis stages defining advanced or nonadvanced fibrosis.
The aim of this study was to assess the relationships between the prevalence of each fibrosis stage defining nonadvanced and advanced fibrosis and the AUCs in a large number of patients to better standardize the expression of fibrosis markers. It was anticipated that such an evaluation might allow more accurate comparisons to be made between fibrosis markers. The relationships between AUCs and biopsy length and biopsy fragmentation have been analyzed in a separate study and found to be less highly associated with variability in AUCs than the prevalence of each fibrosis stage defining nonadvanced and advanced fibrosis (14).
| Materials and Methods |
|---|
|
|
|---|
The uniform distribution of fibrosis stages was defined by a prevalence of 0.20 for each of the 5 stages (uniform prevalence). In the case of a uniform prevalence, the mean fibrosis stage in METAVIR units is 3 for advanced fibrosis: [mean of (F2 + F3 + F4)/3 = (2 + 3 + 4)/3 = 3] vs 0.5 for nonadvanced fibrosis [mean of (F1 + F0)/2 = (1 + 0)/2 = 0.5]. In this uniform prevalence distribution, the difference between the mean fibrosis stage of advanced fibrosis minus the mean fibrosis stage of nonadvanced fibrosis (DANA) was 2.5.
The naturally observed distribution of fibrosis stages was defined by the prevalence of stages observed among patients with chronic hepatitis C, including those in a previous large study on the natural history of fibrosis progression (15). With liver biopsy estimates (n = 2235), the stage prevalences were F0 10%, F1 36%, F2 21%, F3 15%, and F4 18%. In the case of naturally observed stage prevalences, the mean fibrosis stage in METAVIR units was 2.94 for advanced fibrosis: = [(0.21 x 2 + 0.15 x 3 + 0.18 x 4)/0.54 = 2.94] vs 0.78 for nonadvanced fibrosis: (0.36/0.46) = 0.78. In this naturally observed prevalence distribution, the DANA was 2.16 (2.94 – 0.78 = 2.16).
The DANA was used for the standardization of fibrosis marker AUCs. The 1st step was to plot the AUCs of a validated fibrosis marker, FT, for the diagnosis of advanced fibrosis in a large database vs different combinations of fibrosis stages, resulting in a wide spectrum of DANA. The DANA range may vary from 1 to 4. A DANA of 1 would be obtained if advanced fibrosis is composed of F2 only, and nonadvanced fibrosis of F1 only (the mean of F2 = 2 minus the mean of F1 = 1, giving a DANA of 1). A DANA of 4 would be obtained if advanced fibrosis is composed of F4 only, and nonadvanced fibrosis of F0 only (the mean of F4 = 4 minus the mean F0 = 0, giving a DANA of 4). The regression between the AUCs and the whole DANA spectrum enables the AUC to be estimated from the DANA. The regression formula for standardizing AUCs estimated from different stage prevalences was AUC = 0.582 + 0.1056 x (DANA).
The 2nd step was to check that the relationship observed in the integrated database with individual data was similar to that observed in an overview of the published diagnostic studies of FT. In the published studies, the DANA was not given by the authors but was estimated (eDANA) by the following formula: eDANA = mean advanced fibrosis as estimated by [(prevalence of F2 x 2 + prevalence F3 x 3 + prevalence F4 x 4)/(prevalence F2 + prevalence F3 + prevalence F4)] minus the mean nonadvanced fibrosis as estimated by [prevalence F1/(prevalence F0 + prevalence F1)]. The equivalence of eDANA (from 19 different prevalence populations) with the direct DANA estimate using individual data (n = 1312) was checked using the integrated database with an almost perfect correlation [R Pearson = 0.999991 and Spearman = 1.0 (n = 19)].
We defined the uniform DANA at 2.5 (each stage prevalence = 0.20, giving a mean of 3 METAVIR fibrosis in the advanced fibrosis group and a mean of 0.5 in the nonadvanced group). An adjusted uniform AUC (AduAUC), relating the observed DANA value to that seen when fibrosis stages are uniformly distributed (DANA = 2.5), can be calculated from the regression formula linking the observed AUC (ObAUC) to DANA.
We defined the naturally observed DANA at 2.16. The AUC, which was standardized at the DANA value of 2.16, is the value for a given test for the diagnosis of advanced fibrosis when fibrosis stages are distributed as observed in the population. A naturally observed adjusted AUC (AdnAUC), relating the observed DANA to that found in the population (DANA = 2.16), can be calculated from the regression formula linking the ObAUC to DANA.
patients
Integrated database.
The integrated database included published prospective studies of chronic hepatitis C patients with concomitant FT measurement and liver biopsy, who had liver biopsies scored using the METAVIR scoring system (5), had FT assessed on fresh serum using the recommended preanalytical and analytical procedures, and had individual data sent by the principal investigator.
Overview of published diagnostic studies.
Literature was reviewed using PubMed with the following keywords: diagnostic, fibrosis, and FT. Studies were included if fibrosis had been evaluated using the METAVIR scoring system, the diagnostic value was expressed using the AUC and the prevalence of different fibrosis stages was given.
Liver biopsies.
In the integrated database, liver biopsies were processed using standard techniques. A pathologist who was unaware of the biochemical markers evaluated the fibrosis stage and necrosis grade according to the METAVIR scoring system (5). Fibrosis was staged on a scale of 0–4: F0 = no fibrosis, F1 = portal fibrosis without septa, F2 = few septa, F3 = numerous septa without cirrhosis, and F4 = cirrhosis. Biopsies were performed with a 16-gauge Hepafix Luer Lock needle (Braun Melsungen) in the Paris center and the Bordeaux center, and with various needles in the multicenter study from Marseille.
Biochemical markers.
The previously validated FT was used (8). FT (Biopredictive; HCV-Fibrosure, Labcorp) has been validated for the assessment of liver fibrosis in patients with chronic hepatitis C and B, and in patients with alcoholic and nonalcoholic steatosis. FT is a noninvasive blood test that combines the quantitative results of 5 serum biochemical markers (alpha2 macroglobulin, haptoglobin, gamma glutamyl transpeptidase, total bilirubin, and apolipoprotein A1) with the patients age and gender in a patented artificial intelligence algorithm (U.S. Patent 6 631 330) to generate a measurement of fibrosis stage in the liver.
statistical analysis
The AUC was used as a measurement of discrimination and estimated using the empirical (nonparametric) method of DeLong et al. (16). The paired method of Zhou et al. (17) was used for statistical comparisons of AUCs. All analyses were performed with NCSS software (NCSS).
| Results |
|---|
|
|
|---|
overview of published studies
A total of 19 populations have been studied for the diagnostic value of FT in HCV, with publications in 14 articles (Table 1
) (9)(10)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29). For 3 studies outlined in 2 publications (26)(29), the prevalence of fibrosis stages was not detailed in the publication but was provided by the principal investigators on request. For 1 study the AUC was unknown (24) and was not included, which therefore resulted in 18 included studies.
|
aucs according to the prevalence of fibrosis stages
In the integrated database combinations and according to stage prevalence, the FT AUCs varied (P <0.001) from 0.67 to 0.98 for DANA ranging from 1.00 (the lowest difference between only stage F2 as advanced fibrosis and only F1 as nonadvanced fibrosis) to 4.00 (the highest difference between only F4 as advanced fibrosis and F0 as nonadvanced fibrosis) (Table 2
). There was a highly significant correlation between the FT AUC and DANA [Spearman R coefficient = 0.95 (P <0.0001)] (Fig. 1
). The regression formula for standardizing AUCs estimated from different stage prevalences was AUC = 0.582 + 0.1056 x (DANA). The FT AUC, which was standardized at the uniform DANA value of 2.5, was 0.85. Therefore the AduAUC, which took into account the observed DANA vs a uniform DANA of 2.5, was calculated using the formula: AduAUC = ObAUC + (0.1056) (2.5 – ObDANA). The FT AUC, which was standardized at the naturally observed DANA value of 2.16, was 0.81. Therefore the AdnAUC, which took into account the observed DANA vs a DANA of 2.16 found when fibrosis stages were distributed as in the population at large, was calculated using the formula: AdnAUC = ObAUC + (0.1056) (2.16 – ObDANA).
|
|
In the overview, the FT AUC varied (P <0.001) from 0.65 to 0.89 for DANA ranging from 1.47 to 3.05. There was also a close correlation between the FT AUC and DANA [Spearman R coefficient = 0.64, (P = 0.004)]. The regression in the overview was not different from that of the integrated database, with almost all points being included in the 95% CI of the regression line (Fig. 1
).
| Discussion |
|---|
|
|
|---|
aucs according to the prevalence of fibrosis stages
First, this study demonstrated that the prevalence of liver fibrosis stages defining advanced fibrosis is a major factor of variability in assessing the diagnostic value of a fibrosis marker. In the present example, FT AUCs varied from 0.67 to 0.98 according to this prevalence. FT was chosen as an example in this study, but this conclusion may be applied to any quantitative marker if the diagnosis of a disease includes several stages. For a disease defined as a combination of stages vs a combination of other stages, the AUCs for the diagnosis of this disease must be expressed in a standardized fashion according to the prevalence of a given stage. We suggest using either a uniform distribution with the same prevalence for each stage (corresponding to a difference in DANA of 2.5 fibrosis METAVIR units between advanced and nonadvanced fibrosis), or a naturally observed distribution of stages (corresponding to a DANA of 2.16 fibrosis METAVIR units). In the present integrated database, the ObAUC was 0.80, the standardized AduAUC was 0.85, and the AdnAUC was 0.81.
This finding is clinically significant, because it is easier to discuss the apparently discordant results of a given biomarker observed in the literature. For example, the last 2 published studies of FT in chronic hepatitis C found an ObAUC of 0.74 for Wilson et al. (28) and an AUC of 0.83 for Sene et al. (29). This difference can be fully explained by the difference in fibrosis stages with a DANA of 1.87 for the former and 3.05 for the latter, for a net DANA difference of 1.18. According to the correspondence between the AUC and DANA for FT (Fig. 1
), a difference of 1 point in DANA is equivalent to a difference of 0.1056 in AUCs. Therefore the DANA difference of 1.18 is 1.18 x 0.1056 = 0.12 in AUC difference, and the Wilson AUC adjusted for the same DANA as Sene is 0.74 + 0.12 = 0.86.
These findings also clarify some controversies concerning the biomarkers AUCs for adjacent stages. The usual criticism of biomarkers is that "there is a lack of reliable identification and classification of the intermediate stages of fibrosis" (1)(2)(4)(32). Because AUCs range from 0.60 to 0.71 when adjacent stages are compared, the diagnostic value of biomarkers is usually considered "neither sensitive nor specific" enough to differentiate between these stages. These reviews or editorials suggested that biomarkers could replace biopsies for the diagnosis of cirrhosis but not for intermediate stages. These statements must be revisited considering the spectrum effect. The DANA between adjacent stages is by definition the smallest possible DANA, equal to 1. Biomarkers have lower AUCs when predicting between adjacent stages than between extreme stages, but the same criticism can be applied to biopsy: a 15-mm (even nonfragmented) biopsy has an AUC for adjacent stages of only 0.82 when compared with the entire liver (11), and an overlap of 33% for adjacent stages for nonfragmented liver biopsies >15 mm (33). Therefore an ideal biomarker with 0% false positives and 0% false negatives (AUC of 1), validated using the true gold standard (the entire liver), will have a mean AUC of 0.82 when estimated using a 15-mm liver biopsy. The statement then that only biopsy can identify and classify the "intermediate" (the appropriate word being "adjacent") stages of fibrosis is not evidence based. This statement is probably the consequence of ignorance of the spectrum effect. Biopsy accuracy estimated with AUC is also dependent on the spectrum effect (11). The AUC of biopsy is smaller for the diagnosis between adjacent stages (DANA = 1) than for the diagnosis between advanced vs nonadvanced fibrosis (naturally observed DANA = 2.16), and for the diagnosis between extreme fibrosis stages (DANA = 4).
Another common criticism is that the excellent AUC for F4 vs F0, F1, F2, and F3 is well recognized for many biomarkers, but that the majority of patients with cirrhosis (F4) could be easily diagnosed without a liver biopsy or biomarker. However the cirrhotic patients included in the present study were not symptomatic. Furthermore we previously demonstrated the higher diagnostic value of FT vs classical markers of cirrhosis (34), as well as its similar prognostic value vs biopsy (35).
limitations of the study
This study has several limitations:
Our statistical analysis used a resampling from the same database, which could reinforce unknown bias.
We used a simple standardization for the uniform DANA, giving the same weight to each fibrosis stage. First there is a controversy concerning the linear association between the METAVIR scoring system and the quantity of fibrosis, and also concerning the linear progression of fibrosis. However, even if the exact model is unknown, the METAVIR scoring system is one of the best-validated scoring systems and is recommended in several guidelines.
Another limitation is that such prevalences are not seen in practice. Indeed the prevalence of stage F0 in observational studies is low at 10%, being half of the uniform prevalence of 20%. However, it must be recognized that the prevalences seen in observational studies are not perfect, because it is ethically impossible to perform liver biopsy in a large population of individuals without symptoms and with normal liver function tests. Most of the observational studies have been performed using biopsies in reference center populations. Therefore there is still a possibility that by using noninvasive biomarkers, a more realistic prevalence of stage F0 could be closer to 20% (the neutral hypothesis with uniform fibrosis stage prevalence) than 10% (the naturally observed prevalence, which could underestimate the F0 stage prevalence). Indeed, by using biomarkers in 33 731 patients with presumed chronic hepatitis C, the prevalence of stage F0 was 29% (9881 of 33 731) (36).
We used a more realistic standardization that mirrors the distribution across stages found in studies with population-based sampling and liver biopsy. One can question why standardization is needed at all. The ideal design for an accuracy study is the inclusion of a consecutive series of patients. In the present study, the 3 populations included were consecutive patients in each center, thus reflecting the naturally observed prevalence of fibrosis stages. This finding is in accordance with the similar distribution of fibrosis stages observed in the present study and in our previous study of fibrosis progression (14). In previously published overviews and editorials comparing the accuracy of fibrosis biomarkers, the authors compared and discussed the AUC without any reference to the spectrum effect. The proposed standardizations can help the decision-makers estimate the real AUC of the biomarker corresponding to their own setting.
Diagnostic studies comparing different markers need direct comparisons in the same patients to avoid bias and variability related to indirect comparisons, including spectrum bias. The number of new fibrosis markers is increasing rapidly and it will be increasingly difficult to compare all of them in the same patients (4). AUC standardization cannot replace direct comparisons between markers, but enables misinterpretation in indirect comparisons to be reduced.
In conclusion, published results of diagnostic studies and overviews of fibrosis markers can be misinterpreted by nonspecialists who do not know the impact of the prevalences of fibrosis stages on the AUC estimates. Retrospectively all editorials and overviews must be revisited, because AUCs can range from 0.67 to 0.98 for the same test and the same disease. On the basis of these results, we propose standardizing the expression of the AUCs according to the prevalence of fibrosis stages defining advanced or nonadvanced fibrosis.
Apart from the diagnosis of liver fibrosis, these findings suggest that the evaluation of biomarker utility could be better standardized. "Grouping" of disease stages to control for differences in staging between populations should be used. The standardization of prevalences could be made according to the prevalence of stages in observational studies if well established, or according to a more neutral hypothesis if these prevalences are not available.
| Acknowledgments |
|---|
Financial disclosures: Thierry Poynard has a potential conflict of interest as the inventor of FibroTest with a capital interest in Biopredictive, the company marketing the FibroTest. Mona Munteanu is a full-time employee of Biopredictive.
Acknowledgments: Special thanks to Pierre Bedossa, Patrice Cacoub, Alfredo Alberti, and Giada Sebastiani for furnishing details of their previous publications.
| Footnotes |
|---|
| References |
|---|
|
|
|---|
The following articles in journals at HighWire Press have cited this article:
![]() |
D. Vanderschaeghe, W. Laroy, E. Sablon, P. Halfon, A. Van Hecke, J. Delanghe, and N. Callewaert GlycoFibroTest Is a Highly Performant Liver Fibrosis Biomarker Derived from DNA Sequencer-based Serum Protein Glycomics Mol. Cell. Proteomics, May 1, 2009; 8(5): 986 - 994. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Lambert, P. Halfon, G. Penaranda, P. Bedossa, P. Cacoub, and F. Carrat How to Measure the Diagnostic Accuracy of Noninvasive Liver Fibrosis Indices: The Area Under the ROC Curve Revisited Clin. Chem., August 1, 2008; 54(8): 1372 - 1378. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |