Clinical Chemistry Link to Randox Laboratories Web Site
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Clinical Chemistry 53: 164-172, 2007. First published December 21, 2006; 10.1373/clinchem.2006.076398
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow 076398.Supplemental Data
Right arrow All Versions of this Article:
clinchem.2006.076398v1
53/2/164    most recent
Right arrow Submit an electronic Letter to
the Editor about this paper
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Leeflang, M.
Right arrow Articles by Bossuyt, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Leeflang, M.
Right arrow Articles by Bossuyt, P.
Related Collections
Right arrow Informatics and Statistics
Right arrow Evidence Based Laboratory Medicine and Test Utilization
(Clinical Chemistry. 2007;53:164-172.)
© 2007 American Association for Clinical Chemistry, Inc.


Review

Impact of Adjustment for Quality on Results of Metaanalyses of Diagnostic Accuracy

Mariska Leeflang1,a, Johannes Reitsma1, Rob Scholten2, Anne Rutjes1, Marcello Di Nisio3, Jon Deeks4 and Patrick Bossuyt1

1 Department of Clinical Epidemiology, Biostatistics and Bioinformatics.
2 The Dutch Cochrane Centre, Academic Medical Center, University of Amsterdam, The Netherlands.
3 Department of Medicine and Aging, School of Medicine and Aging Research Center, Ce.S.I., "Gabriele D’Annunzio" University Foundation, Chieti-Pescara, Italy.
4 Department of Public Health and Epidemiology, University of Birmingham, Edgbaston, Birmingham, United Kingdom.

aAddress correspondence to this author at: Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, University of Amsterdam, P.O. Box 22700, 1100 DE Amsterdam, The Netherlands. Fax 0031-20-6912683; e-mail m.m.leeflang{at}amc.uva.nl.


   Abstract
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Background: We examined whether and to what extent different strategies of defining and incorporating quality of included studies affect the results of metaanalyses of diagnostic accuracy.

Methods: We evaluated the methodological quality of 487 diagnostic-accuracy studies in 30 systematic reviews with the QUADAS (Quality Assessment of Diagnostic-Accuracy Studies) checklist. We applied 3 strategies that varied both in the definition of quality and in the statistical approach to incorporate the quality-assessment results into metaanalyses. We compared magnitudes of diagnostic odds ratios, widths of their confidence intervals, and changes in a hypothetical clinical decision between strategies.

Results: Following 2 definitions of quality, we concluded that only 70 or 72 of 487 studies were of "high quality". This small number was partly due to poor reporting of quality items. None of the strategies for accounting for differences in quality led systematically to accuracy estimates that were less optimistic than ignoring quality in metaanalyses. Limiting the review to high-quality studies considerably reduced the number of studies in all reviews, with wider confidence intervals as a result. In 18 reviews, the quality adjustment would have resulted in a different decision about the usefulness of the test.

Conclusions: Although reporting the results of quality assessment of individual studies is necessary in systematic reviews, reader wariness is warranted regarding claims that differences in methodological quality have been accounted for. Obstacles for adjusting for quality in metaanalyses are poor reporting of design features and patient characteristics and the relatively low number of studies in most diagnostic reviews.


   Introduction
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Healthcare professionals seeking the best information about diagnostic tests increasingly turn to systematic reviews of test-accuracy studies, yet a review’s summary estimate can be biased if the studies in the review are flawed. An evaluation of the quality of the original studies, therefore, is an essential issue of any systematic review.

The methodological quality of studies can be defined in terms of their susceptibility to bias. Studies with methodological shortcomings, such as inclusion of healthy control individuals or selective use of multiple reference standards to verify index test results, have produced different measures of test accuracy (1)(2)(3)(4)(5). In most cases, such deficiencies have been associated with inflated estimates of diagnostic accuracy. The inclusion of lower-quality studies in a metaanalysis may therefore produce unrealistically high-accuracy estimates. Accounting for quality differences can be expected to produce less optimistic summary estimates of diagnostic accuracy.

Design feature variability and the presence of studies with suboptimal designs in a systematic review may also increase heterogeneity in results among studies (6)(7)(8). Given these considerations, one can expect strategies that account for quality in metaanalyses of diagnostic accuracy to lead to more homogeneous results and therefore to more precise estimates, with narrower confidence intervals around the accuracy measures of interest than estimates without quality adjustment.

Quality assessment of individual studies in a review may identify both design deficiencies that can lead to bias and sources of variation that can lead to heterogeneity. Several quality-assessment tools, most of which use a "checklist" approach, have been developed for diagnostic-accuracy studies (5). A recently developed generic quality-assessment tool based on a modified Delphi procedure (5)(9) has been recommended by the Cochrane Collaboration as a starting point for quality assessment in diagnostic reviews (10).

Although quality appraisal has been recognized as an essential step of systematic reviews, how study quality should be addressed in metaanalyses of diagnostic-accuracy studies is less clear (5)(11). Strategies to incorporate study quality into metaanalyses can be broadly divided into 3 categories: including all studies, irrespective of quality; analyzing subgroups that differ in quality; and multivariable regression analysis. The slightly different recommendations given in the guiding reports are all based on sparse evidence (12)(13)(14).

To test the hypothesis that adjustment for quality produces less optimistic estimates of diagnostic accuracy and narrower confidence intervals, we compared 3 different strategies for incorporating quality in analyzing a number of previously published systematic reviews of diagnostic-accuracy studies.


   Materials and Methods
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
We studied 3 alternative strategies for quality incorporation in metaanalysis and comparing the results of analyzing all available studies irrespective of their quality, in a series of systematic reviews of diagnostic accuracy studies. Within each systematic review, we compared the summary diagnostic odds ratios (DORs) and the widths of the confidence intervals across these strategies.

study set
To include a broad sample of diagnostic studies that examined a variety of tests over time, we conducted a systematic electronic search for systematic reviews of diagnostic-accuracy studies published between January 1999 and April 2002 (5). This search produced a set of 28 reports of systematic reviews (see appendix in the Data Supplement that accompanies the online version of this article at http://www.clinchem.org/content/vol53/issue2). Details of the search strategy are available from the authors. Inclusion criteria were (a) a systematic review of diagnostic test-accuracy studies, (b) inclusion of at least 10 original studies, (c) no exclusion of primary studies based on design features, and (d) the ability to reproduce the 2 x 2 tables from the original studies. The 28 reports yielded 30 systematic reviews. Details of the inclusion process are reported elsewhere (5).

A variety of conditions and index tests were studied in these 30 reviews (Table 1 ). The median number of studies in a review was 14 (interquartile range, 10–20). The median sample size of the individual studies was 100 (interquartile range, 43–288).


View this table:
[in this window]
[in a new window]

 
Table 1. Characteristics of the systematic reviews in our study set.

assessment of methodological quality
We assessed the methodological quality of all 487 studies included in the 30 reviews with items from the QUADAS instrument (9) (Table 2 ). We limited ourselves to the 7 QUADAS items most closely related to methodological quality and did not use the items that referred to quality of reporting. We dichotomized each item by scoring as deficient any study feature that was not reported.


View this table:
[in this window]
[in a new window]

 
Table 2. QUADAS items included in the 2 definitions of "high quality".

QUADAS item 1 (Table 2Up ) refers to both the generalizability of results and the possibility that the study may produce biased results. We assessed 3 patient-spectrum components that refer to the distorted selection of participants, because previous studies have linked these components to biased accuracy estimates. These components were consecutive enrollment of patients, case-control or 2-gate design vs cohort design, and avoidance of limited challenge (2)(4). Limited challenge was defined as the exclusion of patients with disease characteristics that may produce false-positive or false-negative results (e.g., exclusion of patients with existing lung disorders in an accuracy study of spiral computed tomography for the diagnosis of pulmonary embolism). A 2-gate study was defined as a case-control study in which cases and controls are sampled from 2 distinct source populations by means of different selection criteria (15).

Two independent assessors conducted quality assessments, and consensus meetings resolved disagreements. If necessary, a third person made the final decision.

metaanalysis
We used the summary ROC model of Moses and Littenberg for our metaanalysis (16)(17)(18). Their model uses linear regression analysis to examine how D, the natural logarithm of the DOR, changes as a function of S, which is the sum of logit(sensitivity) and logit(1 – specificity). S is related to the threshold for classifying a test as positive.

We modeled the intercept and slope of the model as fixed effects but included a random effect to allow for variation beyond chance among studies (19). We weighted studies by the inverse of the variance of the log DOR to allow for the precision with which each study measured the log DOR. This procedure gave more weight to larger studies.

In the multivariable quality-adjustment strategies, covariates representing quality items were added to the model; this step allowed the intercept and slope in the regression analysis to differ between subgroups of studies defined by the corresponding covariate. In all strategies, we estimated the summary DOR over all studies andthe metaanalysis at the mean S value of these studies. Because the DOR cannot be calculated in 2 x 2 tables containing a zero, we added 0.5 to all 4 cells in these situations as a continuity correction (16)(20).

strategies for incorporating quality
We compared the following 3 statistical approaches to account for quality in metaanalyses: (a) The "restrict" strategy applied to metaanalysis of high-quality studies only. Studies were regarded as "high-quality" when they fulfilled all quality criteria. (b) The "adjust all" strategy involved multivariable adjustment for all individual quality items by including all these items in a single multivariable model, irrespective of the strength of the association between these items and the DOR. (c) The "selective adjustment" strategy consisted of multivariable adjustment for only those quality items that were significantly associated with the DOR in a univariable analysis (P for entry <0.2) (21)(22).

These strategies were compared with a reference strategy in which all studies within the original metaanalysis were included, irrespective of their quality characteristics.

Differences in results between strategies may depend both on the definition of quality and on the statistical approach used. We therefore considered 2 different sets of quality items to define higher-quality studies. The first set was chosen because there is empirical evidence that they can lead to biased results (4)(5). This set, referred to as the "evidence-based" quality definition, includes QUADAS items 5, 6, 10, and 11 (Table 1Up ). The second set of quality items (QUADAS items 1, 5, and 6) is referred to as the "common practice" quality definition and was selected because these 3 items are often applied in diagnostic reviews (5)(11). The restrict strategy and the adjust-all strategy were applied twice, once with the evidence-based definition of quality and once with the common-practice definition.

comparisons and analysis
We compared the summary DOR and its 95% confidence interval for the reference strategy, which included all studies, with the 3 quality-adjusting strategies in all 30 systematic reviews. Differences in results between strategies were analyzed within each systematic review with the Wilcoxon signed rank test to determine whether a strategy consistently led to higher or lower estimates of diagnostic accuracy. To investigate whether the strategies that adjusted for quality also resulted in more precise summary DOR estimates, we again used the Wilcoxon signed rank test statistic to compare the different approaches with respect to the absolute widths of the natural logarithm of the 95% confidence interval around the mean DOR.

To determine whether the change in summary DOR would affect clinical decisions, we used 4 arbitrary categories, which were defined by the absolute size of the summary DOR. If a metaanalysis resulted in a point estimate of the DOR <16, the test was regarded as not useful. We regarded a test with a DOR of 16–81 as moderately useful, a test with a DOR of 81–361 as useful, and a test with a DOR >361 as very useful. The DOR values of 16, 81, and 361 correspond to sensitivity-specificity pairs of 80%–80%, 90%–90%, and 95%–95%, respectively.

We used SAS for Windows, version 9.1.3 (SAS Institute) for all analyses and the proc mixed procedure in SAS to fit all models.


   Results
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
How often the 7 QUADAS items were fulfilled in the 487 studies is shown in Fig. 1 . Nonreporting of items was common, particularly for blinding of the index test (49%) and the reference test (72%), adequate time interval between the index and reference standard (42%), and whether patients were consecutively included (34%).


Figure 1
View larger version (23K):
[in this window]
[in a new window]

 
Figure 1. Overall results of quality assessment of the various QUADAS items in the 487 primary studies.

Items 1a, 1b, and 1c refer to the different components of patient spectrum as we extracted them.

Studies of the case-control or 2-gate type were included in 9 of the 30 reviews. Whether all patients had received the reference standard and whether the reference standard was the same for each patient were well reported (99% of the studies). In 3 reviews, the primary studies used different reference standards to verify index test results.

Applying the evidence-based definition of quality (items 5, 6, 10, and 11 of the QUADAS checklist) identified 72 (15%) of the 487 primary studies as high quality. With this definition, 12 of the 30 systematic reviews had no high-quality studies, and 9 reviews included at least 3 high-quality studies.

Applying the common-practice definition identified 70 high-quality studies (14%). With this definition, 9 systematic reviews contained no high-quality studies, and 11 reviews had at least 3 high-quality studies. Use of both definitions yielded only 3 reviews that contained ≥3 high-quality studies.

comparing the pooled estimates of the various strategies
The summary DORs and the corresponding 95% confidence intervals were obtained for all 30 systematic reviews with the reference and 3 quality-adjustment strategies (Fig. 2 ).


Figure 2
View larger version (17K):
[in this window]
[in a new window]

 
Figure 2. Point estimates of the DOR and confidence intervals of all analyses.

The abscissa represents the DOR, and the ordinate lists each metaanalysis by the first author, with the number of included studies in parentheses. Dotted lines reflect a DOR of 16 (i.e., a test with 80% sensitivity and 80% specificity), a DOR of 81 (90% sensitivity and 90% specificity), and DOR of 361 (95% sensitivity and 95% specificity). Analyses are indicated as follows: not incorporating quality ({diamondsuit}), evidence-based restricted ({blacksquare}), common-practice restricted ({square}), evidence-based multivariable ({blacktriangleup}), common-practice multivariable ({triangleup}), and selective adjustment (x).

The evidence-based restrict strategy, which analyzed only high-quality studies according to the evidence-based definition, could be applied in 9 reviews containing ≥3 high-quality studies. In 3 cases, the DOR for the high-quality studies was higher than the DOR obtained by ignoring quality and including all studies, whereas the opposite occurred in 5 cases (P = 0.64). In 1 review, the DOR did not change, because all studies were high-quality studies according to the evidence-based definition. We found only 2 or fewer high-quality studies in the other reviews, and we did not calculate a summary estimate based on these small numbers.

The restrict strategy with the common-practice definition could be used in 11 reviews. This restrict strategy produced a higher DOR in 4 metaanalyses and a lower estimate in 7 others. The mean odds ratio was not significantly higher or lower when quality was not incorporated, compared with the different restrictive strategies (Table 3 ).


View this table:
[in this window]
[in a new window]

 
Table 3. Comparison of DORs and 95% confidence interval widths of different quality-incorporating strategies.1

When we included all the items of the evidence-based quality definition as covariates in the multivariable model, model building failed in 9 reviews. In these reviews, at least 1 of the quality criteria was not fulfilled by any of the included studies. In 9 of the other 21 reviews, the adjust-all strategy resulted in a DOR estimate that was higher than when quality was not incorporated; 11 times the estimate was lower. In 1 review, all of the original studies could be regarded as of high quality, so there was no change in the summary DOR.

With the common-practice definition, we were able to make a multivariable adjust-all model in 23 reviews. The estimated DOR was higher in 10 reviews and lower in 13. The differences between analyzing studies irrespective of their quality and analyses with the 2 multivariable strategies were not significant (Table 3Up ).

The selective-adjustment strategy included only items that were significantly associated with accuracy in a univariable analysis (P <0.2). In 18 reviews, none of the QUADAS items was significantly associated with accuracy, and the use of all original studies in a metaanalysis yielded the same summary DOR as when quality was disregarded. In 5 reviews, only one single QUADAS item had a significant effect, and in a further 5, 1, and 1 metaanalyses respectively 2, 3, and 4 items were significant. The selective-adjustment strategy led to a higher estimate in 5 cases and to a lower estimate in 7 cases, compared with the metaanalysis in which quality was not incorporated.

Fig. 3 shows the relative DORs (compared with not including quality in the analysis) for the various quality-adjustment strategies. The symmetrical distribution around unity illustrates that there is no systematic trend in underestimating or overestimating the DOR of a test. However, in 5 cases, the alternative strategy resulted in a DOR >5 times higher than when quality was disregarded; in 3 cases the relative DOR was <0.2.


Figure 3
View larger version (8K):
[in this window]
[in a new window]

 
Figure 3. Relative DOR for each metaanalysis.

DORs of different quality-adjusting strategies are compared with the DOR for the ignore-quality strategy. A relative DOR >1.0 means that the DOR of the quality-adjusted metaanalysis was higher than when quality was not taken into account. A relative DOR <1.0 means that the DOR was greater when no adjustment for quality was made. The thin line represents a relative DOR of 1.0, i.e., no difference between the adjusted and nonadjusted analyses. Indicated are the evidence-based restricted strategy ({blacksquare}), the common-practice restricted strategy ({blacktriangleup}), the evidence-based multivariable strategy ({blacktriangledown}), the common-practice multivariable strategy ({diamondsuit}), and the selective-adjustment strategy (•).

None of the quality-adjustment strategies produced systematically narrower confidence intervals for the summary DOR than analyzing studies irrespective of their quality (Table 3Up ). The confidence intervals were significantly wider with the restrict and adjust-all strategies (P <0.01) but did not significantly differ with the selective-adjustment method (P = 0.08).

Because differences between strategies can be due to both differences in quality definitions and differences in statistical methods, we compared the results between statistical methods within 1 definition. We also compared the results with 2 quality definitions within 1 strategy. We observed no systematic differences between the 2 approaches, either for the summary estimates or for their 95% confidence intervals.

The judgment about the usefulness of a test based on the magnitude of the summary DOR was not affected in 12 of the 30 reviews with any of the quality-adjustment strategies (Fig. 2Up ). In 18 reviews, the quality-adjusted DOR obtained with 1 or more of the quality-adjustment strategies ended in a different category than the DOR obtained with all studies included. The DOR was higher in 14 cases and lower in 17 others (Fig. 2Up ).


   Discussion
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
In this reanalysis of 30 previously published systematic reviews, we found no evidence for our hypothesis that adjustment for differences in methodological quality in metaanalysis leads to less optimistic summary diagnostic-accuracy estimates with less variability in results among better-quality studies. We saw no such overall effects for strategies that relied on restriction to high-quality subsets, on multivariable adjustment for a set of quality items, or on selective multivariable adjustment for significant quality items.

A main problem that authors of systematic reviews encounter is poor reporting of study characteristics, and our study was no exception (23). We scored any study feature that was not reported as deficient. Dichotomizing QUADAS items into a simple "yes" or "no" can lead to loss of information, especially when many study characteristics are unreported. Some QUADAS items, such as the use of an adequate reference standard and the generalizability of the patient spectrum, could not be assessed at all in our data set. Both of these items can have a large effect on the performance of a test under study, and a proper incorporation of these characteristics could have resulted in a larger effect of the quality-adjustment strategies.

Because our analysis unit was the single metaanalysis, our sample size was only 30. Therefore, the power for detecting significant trends between strategies was limited, despite the inclusion of 487 individual studies. The 30 systematic reviews covered a wide range of clinical topics and diagnostic tests, with a wide variability in the magnitude of the DOR. Our primary outcome variable was the DOR, which is a single accuracy indicator that incorporates both the sensitivity and specificity of a test. Such a single indicator is convenient in the analysis, but it also means that any given summary DOR can be produced by innumerable sensitivity-specificity combinations. In practice, the value of 1 accuracy measure, say sensitivity, may be more critical than another if the implications of false-positive and false-negative test results differ in severity.

In our analysis, we refrained from calculating summary quality scores for studies and labeling any study that exceeded a certain threshold score as high quality. Such summary quality scores have been extensively studied—and criticized—in systematic reviews of intervention studies. Different shortcomings in study design may cause different forms of bias, making it almost impossible to determine the weight that should be given to each quality item in calculating such quality scores (24)(25). We also did not include a sequential analysis of the studies based on their quality ranking, which would have led to a quality-adjusted cumulative metaanalysis (26). This strategy also requires a hierarchical approach to study quality in that it assumes that some criteria are more important than others and that studies fulfilling more criteria are of higher quality.

Several previous studies have linked design features of diagnostic-accuracy studies to changes in accuracy estimates. One systematic review documented the theoretical and empirical evidence for several sources of bias (4)(5). Two publications, which examined these effects in a collection of systematic reviews, both reported significant effects for a number of features across metaanalyses (1)(2). We can only speculate why we failed to find any systematic differences from incorporating these study features in the metaanalysis process. These earlier studies analyzed the impact of deficiencies in quality in a large number of diagnostic-accuracy studies across a variety of systematic reviews, whereas our study assessed the impact of these quality items on estimates of diagnostic accuracy within systematic reviews. Furthermore, the number of studies with methodological deficiencies was small in a number of the systematic reviews included in our analysis, whereas other reviews contained only studies with deficiencies. Many of these studies with a deficient study design had a small sample size (27). Because the weight of an individual study depends on sample size, these studies had only a minor impact on the summary estimate of diagnostic accuracy. Furthermore, if 2 or more quality items influence accuracy but in opposing directions, the overall estimate obtained irrespective of quality may be similar to the estimate based on high-quality studies only. It is also possible that incomplete reporting has led to misclassification of design features in our project, which may have jeopardized our attempts to find differences in accuracy.

There are other potential explanations for our failed attempts at quality adjustment. The effects of several study-design features may not always be in the same predictable direction. Whether partial verification, for example, will lead to accuracy estimates that are unchanged, lower, or higher, depends on the pattern of verification and the reference standards being used. The ratio of patients with unverified positive index test results and patients of unverified negative test results matters, in particular when being verified or not is related to the presence or absence of the target condition.

Similar remarks have been made in the field of intervention studies, where more metaepidemiologic studies like ours have been performed (28)(29). The aim in metaepidemiologic studies is to evaluate the importance of 1 or more design features across a substantial number of systematic reviews. These studies have shown that metaepidemiologic studies require substantial numbers of systematic reviews with sufficient differences in methodological quality among the included studies. Furthermore, if the effects of design features vary in direction among reviews or even among studies within a single review, metaepidemiologic studies may produce summary estimates that suggest no effect at all (30)(32). Although we have found no systematic trend in results among strategies, reviews in which adjusting for quality has led to substantially different results clearly exist. Because we do not know the true magnitude of accuracy, it is impossible to tell whether the adjusted estimates were closer to the truth.

Not only did we fail to find support for our hypothesis that adjusting for quality will result in less optimistic estimates of test accuracy, we also found no evidence for the hypothesis that adjusting for quality leads to less heterogeneity in results and therefore to smaller confidence intervals. On the contrary, the alternative analyses generally produced broader confidence limits. The main reason for this result is that the alternative strategies were based on fewer studies.

Our study did not produce evidence for the superiority of one type of adjustment over another. Low-quality studies can produce accuracy statistics that do not differ from those obtained in high-quality studies. Although methodological quality may influence the results of metaanalyses, a direct association with results is not necessarily present.

In any review, poor quality will affect the trustworthiness of the conclusions of that review. Our results indicate that the strategy used to correct for quality may affect the estimated accuracy, but not in a predictable way. Our results also indicate that measuring and incorporating quality in a diagnostic review is not a simple task of routinely scoring a few standard quality items and then adjusting for these variables in a multivariable model.

There may be good reasons to identify some quality criteria as crucial for the credibility and applicability of any systematic review. An example could be the selection of the reference standard—QUADAS item 3. These criteria may then be used as inclusion criteria for the review, and authors of systematic reviews might want to report how many studies had to be excluded based on that criterion.

Quality-assessment results of the studies included in a review remains a necessity because it notifies readers about the overall quality of the studies included in the review and may point out differences in design that can help to explain some of the heterogeneity in results. The QUADAS instrument can be used for that purpose. We propose to score "not reported" as a separate category where applicable, and we hope that a more widespread implementation of the STARD statement will lead to better reporting in future reports of diagnostic-accuracy studies (33)(34).

We feel it necessary that quality-assessment results in a systematic review be summarized in a table or a figure. A table can list the extent to which each of the studies fulfilled the quality criteria. A figure, such as the stacked bar chart in Fig. 1Up , can then display the studies for which each of the respective criteria was fulfilled so that the reader can obtain an overview of the quality of the studies included in the review. Plotting results for all of the included studies in ROC space and coding individual studies by color or with symbols can help readers recognize the characteristics of individual studies.

In our view, whether quality is also to be incorporated in a metaanalysis depends on several factors. In the first place, analyzing quality is not even an option if the number of included studies is too low. If the results are very heterogeneous, quality differences can be used to search for an explanation for the heterogeneity, and such a search can be accommodated by stratification or, if appropriate, regression analysis. Caution is needed because it is not unusual for the potential explanations for observed differences to outnumber the studies in a systematic review. It is important to recognize the major limitations of metaepidemiologic approaches in metaanalysis.

Quality is a multidimensional concept, and the importance of individual quality items will vary from one research project to another. The goal of adjusting for quality differences in metaanalysis will remain attractive but elusive until we have large-scale systematic reviews and fully informative reporting in individual studies.


   Acknowledgments
 
J.J.D. is supported in part by a Senior Scientist in Evidence Synthesis Award from the UK Department of Health.


   References
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 

  1. Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van der Meulen JH, et al. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 1999;282:1061-1066.[Abstract/Free Full Text]
  2. Rutjes AW, Reitsma JB, Di Nisio M, Smidt N, Van Rijn JC, Bossuyt PM. Evidence of bias and variation in diagnostic accuracy studies. CMAJ 2006;174:469-476.[Abstract/Free Full Text]
  3. Westwood ME, Whiting PF, Kleijnen J. How does study quality affect the results of a diagnostic meta-analysis?. BMC Med Res Methodol 2005;5:20.[CrossRef][Medline] [Order article via Infotrieve]
  4. Whiting P, Rutjes AW, Reitsma JB, Glas AS, Bossuyt PM, Kleijnen J. Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med 2004;140:189-202.[Abstract/Free Full Text]
  5. Whiting P, Rutjes AW, Dinnes J, Reitsma JB, Bossuyt PM, Kleijnen J. Development and validation of methods for assessing the quality of diagnostic accuracy studies. Health Technol Assess 2004;8:1-234.[Medline] [Order article via Infotrieve]
  6. Dinnes J, Deeks J, Kirby J, Roderick P. A methodological review of how heterogeneity has been examined in systematic reviews of diagnostic test accuracy. Health Technol Assess 2005;9:1-128.[Medline] [Order article via Infotrieve]
  7. Lijmer JG, Bossuyt PM, Heisterkamp S. Exploring sources of heterogeneity in systematic reviews of diagnostic tests. Stat Med 2002;21:1525-1537.[CrossRef][ISI][Medline] [Order article via Infotrieve]
  8. Irwig L, Bossuyt P, Glasziou P, Gatsonis C, Lijmer J. Designing studies to ensure that estimates of test accuracy are transferable. BMJ 2002;324:669-671.[Free Full Text]
  9. Whiting P, Rutjes AW, Reitsma JB, Bossuyt PM, Kleijnen J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol 2003;3:25.[CrossRef][Medline] [Order article via Infotrieve]
  10. Whiting PF, Westwood ME, Rutjes AW, Reitsma JB, Bossuyt PN, Kleijnen J. Evaluation of QUADAS, a tool for the quality assessment of diagnostic accuracy studies. BMC Med Res Methodol 2006;6:9.[CrossRef][Medline] [Order article via Infotrieve]
  11. Whiting P, Rutjes AW, Dinnes J, Reitsma JB, Bossuyt PM, Kleijnen J. A systematic review finds that diagnostic reviews fail to incorporate quality despite available tools. J Clin Epidemiol 2005;58:1-12.[CrossRef][ISI][Medline] [Order article via Infotrieve]
  12. De Vet HC, van der WT, Muris JW, Heyrman J, Buntinx F, Knottnerus JA. Systematic reviews of diagnostic research. Considerations about assessment and incorporation of methodological quality. Eur J Epidemiol 2001;17:301-306.[CrossRef][ISI][Medline] [Order article via Infotrieve]
  13. Deville WL, Buntinx F, Bouter LM, Montori VM, de Vet HC, van der Windt DA, et al. Conducting systematic reviews of diagnostic studies: didactic guidelines. BMC Med Res Methodol 2002;2:9.[CrossRef][Medline] [Order article via Infotrieve]
  14. Khan KS. Systematic reviews of diagnostic tests: a guide to methods and application. Best Pract Res Clin Obstet Gynaecol 2005;19:37-46.[Medline] [Order article via Infotrieve]
  15. Rutjes AW, Reitsma JB, Vandenbroucke JP, Glas AS, Bossuyt PM. Case-control and two-gate designs in diagnostic accuracy studies. Clin Chem 2005;51:1335-1341.[Abstract/Free Full Text]
  16. Littenberg B, Moses LE. Estimating diagnostic accuracy from multiple conflicting reports: a new meta-analytic method. Med Decis Making 1993;13:313-321.[Abstract/Free Full Text]
  17. Moses LE, Shapiro D, Littenberg B. Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. Stat Med 1993;12:1293-1316.[ISI][Medline] [Order article via Infotrieve]
  18. Irwig L, Macaskill P, Glasziou P, Fahey M. Meta-analytic methods for diagnostic test accuracy. J Clin Epidemiol 1995;48:119-130.[CrossRef][ISI][Medline] [Order article via Infotrieve]
  19. Van Houwelingen HC, Arends LR, Stijnen T. Advanced methods in meta-analysis: multivariate approach and meta-regression. Stat Med 2002;21:589-624.[CrossRef][ISI][Medline] [Order article via Infotrieve]
  20. Sweeting MJ, Sutton AJ, Lambert PC. What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data. Stat Med 2004;23:1351-1375.[CrossRef][ISI][Medline] [Order article via Infotrieve]
  21. Steyerberg EW, Eijkemans MJ, Van Houwelingen JC, Lee KL, Habbema JD. Prognostic models based on literature and individual patient data in logistic regression analysis. Stat Med 2000;19:141-160.[CrossRef][ISI][Medline] [Order article via Infotrieve]
  22. Steyerberg EW, Eijkemans MJ, Harrell FE, Jr, Habbema JD. Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Stat Med 2000;19:1059-1079.[CrossRef][ISI][Medline] [Order article via Infotrieve]
  23. Smidt N, Rutjes AW, van der Windt DA, Ostelo RW, Reitsma JB, Bossuyt PM, et al. Quality of reporting of diagnostic accuracy studies. Radiology 2005;235:347-353.[Abstract/Free Full Text]
  24. Juni P, Witschi A, Bloch R, Egger M. The hazards of scoring the quality of clinical trials for meta-analysis. JAMA 1999;282:1054-1060.[Abstract/Free Full Text]
  25. Whiting P, Harbord R, Kleijnen J. No role for quality scores in systematic reviews of diagnostic accuracy studies. BMC Med Res Methodol 2005;5:19.[CrossRef][Medline] [Order article via Infotrieve]
  26. Detsky AS, Naylor CD, O’Rourke K, McGeer AJ, L’Abbe KA. Incorporating variations in the quality of individual randomized trials into meta-analysis. J Clin Epidemiol 1992;45:255-265.[CrossRef][ISI][Medline] [Order article via Infotrieve]
  27. Kjaergard LL, Villumsen J, Gluud C. Reported methodological quality and discrepancies between large and small randomized trials in meta-analyses. Ann Intern Med 2001;135:982-989.[Abstract/Free Full Text]
  28. Deeks JJ, Dinnes J, D’Amico R, Sowden AJ, Sakarovitch C, Song F, et al. International Stroke Trial Collaborative Group; European Carotid Surgery Trial Collaborative Group. Evaluating non-randomised intervention studies. Health Technol Assess 2003;7:9.
  29. Sterne JA, Juni P, Schulz KF, Altman DG, Bartlett C, Egger M. Statistical methods for assessing the influence of study characteristics on treatment effects in ‘meta-epidemiological’ research. Stat Med 2002;21:1513-1524.[CrossRef][ISI][Medline] [Order article via Infotrieve]
  30. Balk EM, Bonis PA, Moskowitz H, Schmid CH, Ioannidis JP, Wang C, et al. Correlation of quality measures with estimates of treatment effect in meta-analyses of randomized controlled trials. JAMA 2002;287:2973-2982.[Abstract/Free Full Text]
  31. Moher D, Pham B, Jones A, Cook DJ, Jadad AR, Moher M, et al. Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses?. Lancet 1998;352:609-613.[CrossRef][ISI][Medline] [Order article via Infotrieve]
  32. Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 1995;273:408-412.[Abstract]
  33. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. Standards for Reporting of Diagnostic Accuracy. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. BMJ 2003;326:41-44.[Abstract/Free Full Text]
  34. Smidt N, Rutjes AW, van der Windt DA, Ostelo RW, Bossuyt PM, Reitsma JB, et al. Reproducibility of the STARD checklist: an instrument to assess the quality of reporting of diagnostic accuracy studies. BMC Med Res Methodol 2006;6:12.[CrossRef][Medline] [Order article via Infotrieve]



The following articles in journals at HighWire Press have cited this article:


Home page
BMJHome page
J. S Cnossen, K. C Vollebregt, N. d. Vrieze, G. t. Riet, B. W J Mol, A. Franx, K. S Khan, and J. A M v. d. Post
Accuracy of mean arterial pressure and blood pressure measurements in predicting pre-eclampsia: systematic review and meta-analysis
BMJ, May 17, 2008; 336(7653): 1117 - 1120.
[Abstract] [Full Text] [PDF]


Home page
Clin. Chem.Home page
M. M.G. Leeflang, K. G.M. Moons, J. B. Reitsma, and A. H. Zwinderman
Bias in Sensitivity and Specificity Caused by Data-Driven Selection of Optimal Cutoff Values: Mechanisms, Magnitude, and Solutions
Clin. Chem., April 1, 2008; 54(4): 729 - 737.
[Abstract] [Full Text] [PDF]


Home page
CMAJHome page
J. S. Cnossen MD, R. K. Morris MD, G. ter Riet MD PhD, B. W.J. Mol MD PhD, J. A.M. van der Post MD PhD, A. Coomarasamy MD, A. H. Zwinderman MSc PhD, S. C. Robson MD, P. J.E. Bindels MD PhD, J. Kleijnen MD PhD, et al.
Use of uterine artery Doppler ultrasonography to predict pre-eclampsia and intrauterine growth restriction: a systematic review and bivariable meta-analysis
Can. Med. Assoc. J., March 11, 2008; 178(6): 701 - 711.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow 076398.Supplemental Data
Right arrow All Versions of this Article:
clinchem.2006.076398v1
53/2/164    most recent
Right arrow Submit an electronic Letter to
the Editor about this paper
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Leeflang, M.
Right arrow Articles by Bossuyt, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Leeflang, M.
Right arrow Articles by Bossuyt, P.
Related Collections
Right arrow Informatics and Statistics
Right arrow Evidence Based Laboratory Medicine and Test Utilization


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS