0
We're unable to sign you in at this time. Please try again in a few minutes.
Retry
We were able to sign you in, but your subscription(s) could not be found. Please try again in a few minutes.
Retry
There may be a problem with your account. Please contact the AMA Service Center to resolve this issue.
Contact the AMA Service Center:
Telephone: 1 (800) 262-2350 or 1 (312) 670-7827  *   Email: subscriptions@jamanetwork.com
Error Message ......
Research Letter |

Computerized Adaptive Test–Depression Inventory Not Ready for Prime Time—Reply FREE

Robert D. Gibbons, PhD1; David J. Weiss, PhD2; Paul A. Pilkonis, PhD3; Ellen Frank, PhD3; Tara Moore, MA, MPH3; Jong Bae Kim, PhD1; David J. Kupfer, MD3
[+] Author Affiliations
1Center for Health Statistics, University of Chicago, Chicago, Illinois
2Department of Psychology, University of Minnesota, Minneapolis, Minnesota
3Western Psychiatric Institute, University of Pittsburgh, Pittsburgh, Pennsylvania
JAMA Psychiatry. 2013;70(7):763-765. doi:10.1001/jamapsychiatry.2013.1322.
Text Size: A A A
Published online

In Reply The Carroll letter “Computerized Adaptive Test–Depression Inventory Not Ready for Prime Time” criticizes our recent article1 published in the November 2012 issue of this journal. Carroll suggests that:

Clinicians do not need another scale to screen for depression using 7 to 22 items. Existing scales do that well with 10 to 12 items and, unlike CAT-DI [Computerized Adaptive Test–Depression Inventory], provide a symptom crosswalk to DSM-IV criteria.…No analyses showed that CAT-DI performance matches existing scales.

This statement is not correct. Indeed, we reported convergent validity with the Patient Health Questionnaire 9, Hamilton Rating Scale for Depression, and Center for Epidemiologic Studies Depression Scale with correlations in the r = 0.80 range. In terms of a “crosswalk to DSM-IV criteria,” the CAT-DI demonstrated sensitivity of 0.92 and specificity of 0.88 with a Structured Clinical Interview for DSM-IV–based diagnosis of major depressive disorder, showing that it provides a very strong linkage to DSM-IV criteria.

In terms of the need for such a scale, what the CAT-DI provides that none of the existing scales do is a standard error of measurement that can be used to assess the uncertainty in the severity score obtained. As described in our article, the adaptive nature of the CAT-DI provides consistent precision of measurement across different respondents by administering different items to different respondents of varying depressive severity. Furthermore, the degree of precision can be set in advance depending on the application. Measurements of depressive severity in a randomized clinical trial may require greater precision than measurements used for screening in primary care or for psychiatric epidemiology. The CAT-DI permits the degree of precision to be selected in advance of testing depending on the requirements of the specific application. The traditional scales cited by Carroll provide the same items to all respondents and therefore allow uncertainty to vary across respondents. Figure 2 in our article clearly shows that the CAT-DI provides a linear scale of measurement with homogenous variance across different patient diagnostic strata (no depression and minor and major depressive disorder), whereas the other scales show lack of discrimination between no depression and minor depression, skewed distributions, and greater overlap across diagnostic groups. It is hard to understand how Carroll concludes that these scales do as well as the CAT-DI and that the psychometric rigor the CAT-DI adds to the literature is of no practical value. It is even more difficult to understand how Carroll concludes that “no analyses showed that CAT-DI performance matches existing scales.”

Carroll suggests that:

CAT-DI has other serious deficiencies. A guideline in multivariate analyses is that 10 times more subjects than items are needed for satisfactory solutions.

This statement is unclear and highlights the limitations of Carroll’s understanding of the statistical foundation of the CAT-DI. Multivariate analysis covers a wide range of statistical methods including multivariate analysis of variance, multivariate regression, factor analysis for measurement data, item response theory, and many other possibilities. The specific requirement of the number of respondents relative to “items” will depend on many factors including the specific multivariate model, types of hypotheses being tested (assuming they are being tested at all), effect sizes, and specific method of estimation (eg, least squares vs maximum likelihood). In the context of item response theory, we certainly need a greater number of respondents than items, but a statement suggesting that it must be 10:1 has no basis whatsoever in statistical theory. Since we are using item response theory to calibrate the model, the question is whether the solution is stable and the estimation procedure has converged. Carroll’s statement is apparently based on a quotation from Nunnally2 and predates the method of estimation used in our article (ie, marginal maximum likelihood estimation) and therefore is at best questionable in terms of the degree to which it applies and, more important, to what it applies. Furthermore, convergence was obtained both in the current study and in our previous study using a similar number of participants and number of items. If the more complex bifactor model was not appropriately estimated, then it would be unlikely that it would provide such an overwhelming improvement in fit over the simpler unidimensional model. Finally, Carroll ignores the description of the analysis where we clearly state that we used a balanced incomplete block design to select a subset of approximately 250 items per participant. Even if we were testing hypotheses regarding the item parameters (for example, to assess differential item functioning), the rule of 10 times the number of subjects to items could not possibly hold true for all hypotheses, item parameters, item response theory models, and effect sizes.

Carroll suggests that “The sample is mostly of low socioeconomic status (Table 2 in the Gibbons et al article) and perhaps marginally literate.” Table 2 of our article1 indicates that 95% of the sample had a high school degree or beyond. Seventy-four percent of the sample had some college. How exactly does Carroll come to the conclusion that they are “marginally literate”? Furthermore, the sample is quite representative of patients with depression in that it is a mixture of patients being seen at an urban tertiary referral center (Western Psychiatric Institute and Clinic at the University of Pittsburgh) and a local Pittsburgh, Pennsylvania, community mental health center.

Carroll suggests that the “choice of threshold SE ≤ 0.3 around CAT-DI scores is not justified, but this standard error appears too large relative to the scores.” Carroll’s lack of familiarity with modern psychometric theory and computerized adaptive testing is made glaringly apparent in this statement. The threshold of 0.3 SE is quite standard in computerized adaptive testing because it implies reliability in excess of 0.9. This follows from the underlying true score distribution being N(0,1) and the definition of reliability being:

r = s2t/(s2t+s2e) = 1/(1 + 0.32) = 0.92

It is unclear why Carroll considers this tradition in computerized adaptive testing to be “too large relative to the scores.” What is the magnitude of the standard errors for the traditional scales that he seems to prefer? Of course, the standard error for an individual score for a traditional test such as the Hamilton Rating Scale for Depression is unknown. Again, it is clear that Carroll lacks the technical expertise to support these criticisms and that his criticism of the CAT-DI is based on something other than science.

Carroll suggests that “The goal of commercial development seems premature; patients risk being “assayed” against a non–gold standard.” We have not proposed the CAT-DI as a gold standard but rather have demonstrated that the test tracks traditional depression measurement scales and the diagnosis of major depressive disorder, yet takes approximately 2 minutes to administer without the need for a clinical interview. Nowhere in the article do we claim that the CAT-DI should replace the clinician. Rather we suggest that the CAT-DI is useful as a screening tool and a method by which the effectiveness of treatment can be monitored.

Carroll goes on to suggest that we agree with his criticism that “It is not ready for clinical use, as Gibbons et al acknowledged, or for research.” This statement is completely inaccurate. We indicated that the software for web-based distribution would not be available until the end of 2012, not that the test is not ready for clinical use.

Carroll suggests that:

CAT-DI does not deliver clinically useful symptom profiles: exemplar case 2 (Table 3 in the Gibbons et al article) was not assessed for sleep, appetite, concentration, or psychomotor disturbances. Thus, after administering CAT-DI, clinicians would still need to administer a standardized scale to verify DSM-IV diagnostic criteria.

This comment appears to miss the point of the CAT-DI. For different individuals, different constellations of items are required to assess their severity. Not all individuals need to be asked all questions from all domains to estimate their depressive severity level. To verify DSM-IV diagnostic criteria, a DSM-IV clinical interview would be required regardless of what a particular rating scale indicates. The item response theory model underlying the CAT-DI allows for different patients to be assessed using different items and still provides valid estimates of the underlying latent variable of interest: depressive severity. This is clearly evident by the fact that the adaptive test results correlate at r = 0.95 with the entire 389-item bank scores, which do contain items from all depressive subdomains. Carroll also suggests that this is a weakness for longitudinal measurement because the same questions are not repeatedly asked. In fact, this is a distinct advantage of the CAT-DI because different questions can be asked on different occasions thereby minimizing response set bias. Furthermore, as we articulated in the article, even greater savings for longitudinal assessments can be obtained by starting the next computerized adaptive test session using the severity estimate from the previous assessment. No such advantage can be realized using traditional psychiatric measurement instruments and the repeated administration of the same items over the course of a study can lead to biased responses.

In summary, it is very clear that Carroll is not a fan of multidimensional item response theory and computerized adaptive testing as applied to the process of psychiatric measurement. It is, however, completely unclear that his lack of enthusiasm is based on any scientifically rigorous foundation. Indeed, his knowledge of these methods seems lacking.

Finally, Carroll is quick to point out the acknowledged potential conflicts of others as if they have led to bias in reporting of scientific information. In this case, it is Carroll who has the overwhelming conflict of interest. As developer, owner, and marketer of the Carroll Depression Scale–Revised, a traditional fixed-length test, it is not surprising that the paradigm shift described in our article would be of serious concern to him.

ARTICLE INFORMATION

Corresponding Author: Robert D. Gibbons, PhD, University of Chicago, 5841 S Maryland Ave, MC 2007 Office W260, Chicago, IL 60637 (rdg@uchicago.edu).

Conflict of Interest Disclosures: The CAT-DI will ultimately be made available for routine administration and its development as a commercial product is under consideration.

Funding/Support: This work was supported by National Institute of Mental Health grant R01-MH66302.

Gibbons  RD, Weiss  DJ, Pilkonis  PA,  et al.  Development of a computerized adaptive test for depression. Arch Gen Psychiatry. 2012;69(11):1104-1112.
PubMed   |  Link to Article
Nunnally  JC. Psychometric Theory.2nd ed. New York, NY: McGraw-Hill; 1978.

Figures

Tables

References

Gibbons  RD, Weiss  DJ, Pilkonis  PA,  et al.  Development of a computerized adaptive test for depression. Arch Gen Psychiatry. 2012;69(11):1104-1112.
PubMed   |  Link to Article
Nunnally  JC. Psychometric Theory.2nd ed. New York, NY: McGraw-Hill; 1978.

Correspondence

July 1, 2013
Bernard J. Carroll, MBBS, PhD, FRCPsych
1Pacific Behavioral Research Foundation, Carmel, California
JAMA Psychiatry. 2013;70(7):763. doi:10.1001/jamapsychiatry.2013.1318.
CME
Also Meets CME requirements for:
Browse CME for all U.S. States
Accreditation Information
The American Medical Association is accredited by the Accreditation Council for Continuing Medical Education to provide continuing medical education for physicians. The AMA designates this journal-based CME activity for a maximum of 1 AMA PRA Category 1 CreditTM per course. Physicians should claim only the credit commensurate with the extent of their participation in the activity. Physicians who complete the CME course and score at least 80% correct on the quiz are eligible for AMA PRA Category 1 CreditTM.
Note: You must get at least of the answers correct to pass this quiz.
Please click the checkbox indicating that you have read the full article in order to submit your answers.
Your answers have been saved for later.
You have not filled in all the answers to complete this quiz
The following questions were not answered:
Sorry, you have unsuccessfully completed this CME quiz with a score of
The following questions were not answered correctly:
Commitment to Change (optional):
Indicate what change(s) you will implement in your practice, if any, based on this CME course.
Your quiz results:
The filled radio buttons indicate your responses. The preferred responses are highlighted
For CME Course: A Proposed Model for Initial Assessment and Management of Acute Heart Failure Syndromes
Indicate what changes(s) you will implement in your practice, if any, based on this CME course.

Multimedia

Some tools below are only available to our subscribers or users with an online account.

533 Views
0 Citations
×

Related Content

Customize your page view by dragging & repositioning the boxes below.

See Also...
Related Collections
Jobs