Research Column: Why Measurement Matters: A Systematic Inquiry of Group Therapy Outcome Studies

Sean Woodland, Marie Ricks, Griffith Jones, & Kyle Lindsay
Brigham Young University

Conclusions regarding treatment efficacy are best justified when ascertained with a high level of measurement precision, rigorous methods, and inferences accurately reflecting the research question (Cook & Campbell, 1979; Bednar, Burlingame, & Masters, 1988). Measurement precision is a foundational piece that, when ignored, can negatively affect the ability to assert therapy effectiveness. Several proposed frameworks for better understanding outcome measures illustrate that the level of rigor applied is directly related to the strength of the study’s inferences (Bednar, Burlingame, & Masters, 1988; Burlingame et al. 2005; Erbes et al. 2004; Lambert, Ogles, & Masters, 1992). Unfortunately, the psychometric properties of outcome measures often go unreported, which leads to unneeded diversity in measures used, and either chaos or apathy in measurement selection (Lambert, Ogles, & Masters, 1992).  The aim of our study was to ascertain whether these issues hold true in studies of group psychotherapy.

Group therapy outcome studies (n=89) were obtained through extensive literature searches. To be included samples needed to have at least 12 participants, with a minimum two-thirds primarily diagnosable with either schizophrenia or borderline personality disorder. Outcome measures (n=197) were extracted from these studies and reviewed for listing of normative and local psychometrics (i.e., reporting of reliability and/or validity). Outcome domains included depression, anxiety, general mental health, quality of life, and disorder-specific symptoms.  Measures were excluded from the study if articles cited as seminal were unobtainable.  All literature searches were completed twice to insure accuracy.

Of the measures analyzed 31 (15.8%) were “investigator-generated” measures created specifically for the outcome study without prior use or citation.  Each study used an average of 4.68 measures (SD=2.81).  The average number of validities reported within each outcome study was 1.09 (SD=2.56), making up 19% of sample of measures.  The average reliability reported was 1.63 (SD=2.84), representing 45.1% of the average number of measures used per study. Evidence of previous internal consistency was reported 64 times. The average validities found per measure were 1.57 (SD=1.60). For measures used at least three times, the average number of validities increased to 3.20 (SD=1.39).  The most common validity was criterion-related, followed by construct-validity, discriminant, convergent, factorial, and content-validity.

While fairly simple, the results are notable. The studies in our sample reported instrument reliability less than 30% of the time and validity was reported less than 25% of the time; this indicates chronic underreporting of both normative and local psychometrics. Also notable was the fact that 16% of the measures were “investigator-generated.” Combined with the result that less than half of the 197 measures were used more than once, this implies an overabundance of instruments measuring similar constructs, and raises questions about creating new measures rather than relying on those previously validated. It is recommended that previously validated instruments be favored in future studies to better insure measurement precision. Further, we suggest that group researchers increase inclusion of psychometric data in their methods sections to decrease doubt about the rigor associated with measures they implement in outcome studies.


Bednar, R. L., Burlingame, G. M., & Masters, K. S. (1988). Systems of family treatment: Substance or semantics? Annual Review of Psychology39(1), 401-434.

Burlingame, G. M., Dunn, T. W., Chen, S., Lehman, A., Axman, R., Earnshaw, D., & Rees, F. M. (2005). Special section on the GAF: Selection of outcome assessment instruments for in patients with severe and persistent mental illness. Psychiatric Services56(4), 444-451.

Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis for field setting. MA: Houghton Mifflin.

Erbes, C., Polusny, M. A., Billig, J., Mylan, M., McGuire, K., Isenhart, C., & Olson, D. (2004). Developing and Applying a Systematic Process for Evaluation of Clinical Outcome Assessment Instruments. Psychological Services1(1), 31.

Lambert, M. J., Ogles, B. M., & Masters, K. S. (1992). Choosing outcome assessment devices: An organizational and conceptual scheme. Journal of Counseling & Development70(4), 527-532.

Categories: Columns


3 replies

  1. Great info. Lucky me I came across your site by chance (stumbleupon).
    I have saved it for later!

  2. Amazing article, thank You !!

  3. Excellent web site you have got here.. It’s hard to find excellent writing like yours these days.
    I seriously appreciate individuals like you!
    Take care!!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: