The Sensitivity and Specificity of the 9-Item Patient Health Questionnaire When Screening Stroke Survivors for Poststroke Depression
Annually, approximately 800 000 people in the United States experience a stroke, which is the fifth leading cause of death in the United States.1 Medical complications frequently occur after stroke contributing to increased length of stay, higher inpatient readmission rates, higher medical costs, delays in entering rehabilitation, higher mortality rates, and poststroke depression (PSD).2 Poststroke depression has devastating personal consequences on recovery including lower quality of life and functional status.3 Stroke is recognized in the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) along with pulmonary disease, cancer, autoimmune diseases, metabolic disturbances, gastrointestinal diseases, cardiovascular diseases, and endocrine diseases as one of only 8 medical conditions that cause depressive symptoms.4 Poststroke depression may follow either an ischemic stroke or a hemorrhagic stroke. The symptom profile patterns of depression are believed to be similar among individuals who have experienced a stroke and those in the general population who develop depression.5
Effective screening for early symptoms of PSD could assist clinicians in earlier identification of PSD. However, when choosing an appropriate screening tool it is important that the tool exhibit good sensitivity and specificity, effective positive predictive value (PPV) and negative predictive value (NPV), to be considered valid. One screening tool recommended by the American Health Association and American Stroke Association for PSD is the 9-item Patient Health Questionnaire (PHQ-9).6 The PHQ-9 can be used to screen, diagnose, monitor, and measure the severity of depression with an individual summed score ranging from 0 (no depressive symptoms) to 27 (all symptoms occurring daily).7 The purpose of this article is to identify the sensitivity and specificity of the PHQ-9 for use in depression screening among poststroke survivors through a review of the literature.
Prevalence of PSD
Whereas stroke is the second leading cause of death worldwide, depression ranks second as the leading cause of disability around the world.8 Estimates of PSD prevalence are difficult to determine and vary based on the clinical setting in which stroke survivors are evaluated and criteria used to diagnose PSD. Thus, reported PSD prevalence rates vary widely from 25% to 79% of all stroke survivors.9,10 PSD is often misdiagnosed and undertreated despite its high prevalence rates, with an estimated 50% of stroke survivors with PSD undiagnosed and thus untreated.11 Underdiagnosis of PSD may result in serious adverse outcomes, namely, stroke recurrence at 1 year, subtherapeutic recovery, decreased consistency with therapies, poorer functional outcome, greater medical costs, increased social isolation and lower quality of life, and up to 10 times increased mortality.11
Beginning January 1, 2018, the Joint Commission no longer required Comprehensive Stroke Centers to assess stroke patients for depression before discharge.12 The removal of required screening for PSD came soon after a joint publication by the American Heart Association and the American Stroke Association of clinical practice guidelines that continued to recommend that all stroke patients be routinely screened for depression using a valid depression screening tool.13 However, these guidelines were developed with limited high-quality evidence to indicate that screening for PSD contributes to improved stroke outcomes.6 When diagnosing depression, the Structured Clinical Interview for DSM-5 Disorders, Clinical Version (SCID-5-CV) is considered the criterion standard for diagnosing depression, and this same criterion is most often used to diagnose PSD.14
Validation of Screening Tools
Calculation of 4 objective measures is required for validation of a screening tool’s performance: sensitivity, specificity, PPV, and NPV.15 The most favorable screening tool should accurately determine subjects with and without the disease with 100% precision.15 The PPV and NPV, which are correlated to sensitivity and specificity through disease prevalence, are vital indicators of diagnostic accuracy.16 The likelihood a disease is present given a positive test results is the PPV, whereas the likelihood a disease is not present given a negative test result is the NPV.7 The sensitivity of a diagnostic tool quantifies its capacity to accurately distinguish subjects with the disease state. Specificity is the tool’s capacity to precisely pinpoint subjects without the disease condition.16 Finally, distinguishing between disease positive and disease negative requires a predetermined cutoff value, which denotes that test results equal to or greater than this value are determined to be positive (T+) and test results less than this value are otherwise determined to be negative (T).16 The designation of the cutoff value determines the rates of true-positive, true-negative, false-positive, and false-negative test results.17 Receiver operating characteristic curve analysis, which is a graphical illustration of the relationship between a tool’s sensitivity and its specificity, is one of the most commonly used methods to analyze the effectiveness of a diagnostic test.17 In a perfect tool, both the sensitivity and specificity are equal to “1.” The value for the area under the receiver operating characteristic curve (AUC) varies between 0.5, which indicates an inaccurate test, and 1.0 for a perfectly accurate test.17
The Patient Health Questionnaire (PHQ-9)
The PHQ-9 is a pragmatic screening tool that assesses for the DSM-5 criteria.7 When screening with the PHQ-9, persons are asked to self-report any of the 9 depressive symptoms they have experienced for the past 2 weeks and with what frequency they have experienced them (not at all, several days, more than half the days, or nearly every day).7 The PHQ-9 is a self-report version of the Primary Care Evaluation of Mental Disorders diagnostic instrument for common mental disorders and is composed of a 9-item scale that assesses the 9 DSM-5 depression symptom criteria for occurrence in the 2 weeks before screening.7 The first 2 questions of the PHQ-9 assess anhedonia (inability to feel pleasure) (Q1: “Little interest or pleasure in doing things”) and mood (Q2; “Feeling down, depressed, or hopeless”). Persons reporting 4 or more out of nine depressive symptoms for more than half the days or nearly every day including questions 1 and 2 are considered to have a positive depression screen requiring follow-up for diagnosis.7 Major depression is diagnosed if greater than 5 of 9 depressive symptom criteria have occurred “more than half the days” in the past 2 weeks and one of the symptoms corresponds to question 1 or 2. A self-report of the 1 symptom, “thoughts that you would be better off dead or of hurting yourself in some way,” is considered a positive screen, regardless of other symptoms reported or duration.7 Moreover, PHQ-9 scores of 5, 10, 15, and 20 represent mild, moderate, moderately severe, and severe depression, respectively. A score of 10 or greater is considered major depression, which requires follow-up diagnosis and treatment.18
Initial validity of the PHQ-9 was established with findings from a study with a sample of 3000 adult patients evaluated for depression by 62 primary care physicians (21 general internal medicine, 41 family practice), which found that there was good agreement between PHQ diagnoses and those of independent mental health professionals who diagnosed subjects using the DSM-5 (SCID-5-CV) within 48 hours of PHQ-9’s completion (for the diagnosis of any 1 or more PHQ disorder, κ = 0.65; overall accuracy, 85%; sensitivity, 75%; specificity, 90%).7 In subsequent research conducted with 1422 patients from 5 general internal medicine clinics, 1578 patients from 3 family practice clinics, and an additional 3000 subjects (age range, 18–99 years; mean, ±31 years) from 7 obstetrics-gynecology sites, researchers achieved 88% sensitivity and 95% specificity for major depression, using a PHQ-9 cutoff score of 10 or greater for major depression. Although postpartum depression is recognized as a unique condition, the researchers did not specify whether postpartum women were included in the study sample, among the obstetrics-gynecology or other sites, which limits the generalizability of the findings.7 In the initial studies, the PHQ-9 has been used to screen for depression in populations with chronic conditions. For example, in a study in which patients with chronic hepatitis C were screened for depression with the PHQ-9, the PHQ-9 had an overall accuracy at 90.43%, with a sensitivity of 83.84% and a specificity of 97.01%.19 It has also shown a sensitivity of 54% and a specificity of 85% when screening for depression and in a population of patients with Parkinson disease.20
The following databases were used to search for published peer-review studies related to screening for PSD using the PHQ-9 depression screening tool: CINAHL, Health Source/Nursing, PsycInfo, MEDLINE, MEDLINE with full text, Academic Search Complete, EBSCO, and ERIC. The primary 3 search strings composed of the following Boolean operators were as follows: “PHQ-9 AND validity,” “PHQ-9 AND depression AND stroke,” and “stroke AND depression AND screening.” Articles were screened using the following inclusion criteria: peer-reviewed primary research studies published between 2012 and 2018, study participants in adulthood (≥18 years old), English language, and those using the PHQ-9 screening tool to screen for depression, specifically in stroke survivors. Preferred Reporting Items for Systematic Reviews and Meta-Analyses is used to report findings of this literature review.21 The literature search process is depicted in Figure 1.
Each primary study was graded for evidence level using the American Association of Critical Care Nurses’ levels of evidence, which ranks evidence on a scale of A-M to ensure that evidence is valid and credible. Levels “A and B” are awarded to research articles with an experimental design.22 Levels A, B, and C are all based on research (either nonexperimental or experimental designs), which are regarded as evidence. Levels D, E, and M are considered recommendations drawn from theory, articles, or manufacturers’ recommendations.22
As per the Preferred Reporting Items for Systematic Reviews and Meta-Analyses diagram, a total of 6 studies were deemed relevant to this review. Study author(s), study design, level of evidence, characteristics of sample populations, timing, and location of depression screening as well as primary findings are summarized in Table 1 and (Supplemental Digital Content 1, http://links.lww.com/JNN/A166). Five studies included in this review met all preestablished inclusion criteria.23–25,27 The sixth study did not meet all inclusion criteria because it was published before 2012 but was included because it is considered seminal research on this topic.28 All 6 studies included in this review were at the level “B.” Included studies represented an overall sample size of 881 stroke survivors and 49 transient ischemic attack stroke survivors.
Screening stroke survivors for PSD using the PHQ-9 was completed in person and in various clinical settings in 5 of the studies and the time of screening in relation to the stroke episode varied from 14 days or less post stroke to 20.1 months post stroke event.23–26,28 The sixth study did not conduct depression screenings but extrapolated data from records via a retrospective chart review.27 Two studies did not report a value for AUC,23,27 and one did not report PHQ-9’s sensitivity and specificity, NPV, and PPV.27
Only 2 of 6 studies reviewed reported data on all the necessary components (the PHQ-9’s sensitivity, specificity, PPV, NPV, AUC, and cutoff point) to determine the validity of the PHQ-9 when screening stroke survivors for depression with varying results between the 2 reviewed studies.23,25 One of these 2 studies demonstrated both less-than-acceptable specificity value and PPV.23
Among the 5 studies that reported findings in relation to measures of validity, the PHQ-9’s sensitivity ranged from 69% to 100% and specificity ranged from 63% to 97.1%.23–26,28 Four of 6 studies reported data on AUC, which ranged from 0.86 to 0.96.24,25,27,28 Three of the 6 studies reported both PPV, which ranged from 32% to 75%, and NPV, which ranged from 78% to 100%.23–25
Between 2012 and 2018, 5 peer-reviewed articles of original research on screening stroke survivors for depression using the PHQ-9 have been added to the literature, which is concerning that so few research articles focused on PSD given the prevalence of PSD. Because only 2 of 6 studies reviewed reported data on all the necessary components to determine the validity of the PHQ-9 when screening stroke survivors for depression with varying results between the 2 reviewed studies, including 1 of the 2 studies demonstrating both less-than-acceptable specificity value and PPV, although the PHQ-9 is widely used to screen stroke survivors for PSD, the results of this literature review indicate that its validity for use in this population remains inconclusive.
Findings from this review mirror results of a published meta-analysis of 24 studies aimed at identifying the most accurate depression screening tool to detect PSD.29 Three studies included in this literature review were also included in the meta-analysis,24,26,28 with results from both the meta-analysis and this literature review noting that the PHQ-9 had a sensitivity of 86% and a specificity of 79% when used to screen stroke patients for PSD. Both this literature review and the meta-analysis found the PHQ-9’s rule-in clinical utility for depression was “fair” whereas its rule-out clinical utility was “good.”29 This seems to suggest the PHQ-9 is better suited as a screening tool to identify stroke survivors without depression rather than stroke survivors with depression.29
This review only included primary research articles published in English by peer-review journals from 2012 to 2018. Potentially relevant unpublished dissertations and peer-reviewed articles published before 2012 were not included, other than the seminal work by Williams et al.28 Finally, although PSD impacts both individuals with acute stroke and those in recovery from a more remote stroke, variation in the timing of the administration of the PHQ-9 post stroke onset between studies may limit generalizability of the review’s findings.
The findings of this literature review highlight challenges within the research studies identified including small and homogenous study populations, differences in severity of stroke, varied and narrow inclusion and exclusion criteria, omission of aphasic stroke survivors or stroke survivors with a previous diagnosis of depression, and failure to consistently report type of stroke. Reflection on depressive symptoms that occurred in the last 2 weeks before screening may be challenging for stroke survivors during the acute care stage, given the average hospital length of stay for acute ischemic stroke patients continues to decrease, going from 6.9 ± 4.2 days in 1993–1994 to 4.66 ± 3 days in 2006–2007.30 Thus, any symptoms reported may reflect a prestroke status, rather than poststroke. Furthermore, in the 6 studies included in this literature review, the severity of stroke in the sample was not reported, screening did not occur in the same clinical setting or same period post stroke, cutoff scores for the PHQ-9 were not consistent across studies, and not all results were compared against the criterion standard for diagnosing mood disorders (SCID-5-CV).
Poststroke depression often goes unrecognized, underdiagnosed, and undertreated despite occurring frequency post stroke. Clinical guidelines highly recommend routine screening for PSD as part of overall stroke care. For this reason, it is critical for researchers to address the variances highlighted in this review to help identify the most appropriate and valid tool to use when screening for stroke survivors for PSD. Although widely used to screen for depression, the PHQ-9 does not yet show consistent levels of measures of validity for utilization in stroke populations. Further research is warranted to demonstrate consistent validity of the tool, and additional reviews of other depression screening tools should also be conducted.
1. Mozaffarian D, Benjamin EJ, Go AS, Arnett DK, Blaha MJ. Heart disease and stroke
statistics—2016 update: a report from the American Heart Association. Circulation
2. Ingeman A, Andersen G, Hundborg HH, Svendsen ML, Johnsen SP. In-hospital medical complications, length of stay, and mortality among stroke
unit patients. Stroke
. 2011;42(11):3214–3218. doi:10.1161/strokeaha.110.610881.
3. Žikić TR, Divjak I, Jovićević M, et al. The effect of post stroke
depression on functional outcome and quality of life. Acta Clin Croat
4. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders: DSM-5
. 5th ed. 2013.
5. de Man-van Ginkel JM, Hafsteinsdóttir TB, Lindeman E, Geerlings MI, Grobbee DE, Schuurmans MJ. Clinical manifestation of depression after stroke
: is it different from depression in other patient populations? PLoS One
. 2015;10(12):e0144450. doi:10.1371/journal.pone.0144450.
6. Towfighi A, Ovbiagele B, El Husseini N, et al. Poststroke depression
: a scientific statement for healthcare professionals from the American Heart Association/American Stroke
. 2017;48(2):e30–e43. doi:10.1161/str.0000000000000113.
7. Spitzer RL, Kroenke K, Williams JB. Validation and utility of a self-report version of PRIME-MD: the PHQ primary care study. Primary Care Evaluation of Mental Disorders. Patient Health Questionnaire. JAMA
8. Ferrari AJ, Charlson FJ, Norman RE, et al. Burden of depressive disorders by country, sex, age, and year: findings from the Global Burden of Disease Study 2010. PLoS Med
. 2013;10(11):e1001547. doi:10.1371/journal.pmed.1001547.
9. Lökk J, Delbari A. Management of depression in elderly stroke
patients. Neuropsychiatr Dis Treat
. 2010;539–549. doi:10.2147/ndt.s7637.
10. Robinson RG, Jorge RE. Post-stroke
depression: a review. Am J Psychiatry
. 2016;173(3):221–231. doi:10.1176/appi.ajp.2015.15030363.
11. Espárrago Llorca G, Castilla-Guerra L, Fernández Moreno MC, Ruiz Doblado S, Jiménez Hernández MD. Post-stroke
depression: an update. Neurol
. 2015;30(1):23–31. doi:10.1016/j.nrleng.2012.06.006.
13. Miller EL, Murray L, Richards L, et al. Comprehensive overview of nursing and interdisciplinary rehabilitation care of the stroke
patient: a scientific statement from the American Heart Association. Stroke
. 2010;41(10):2402–2448. doi:10.1161/str.0b013e3181e7512b.
14. First M, Williams J, Karg R, Spitzer R. Structured Clinical Interview for DSM-5 Disorders, Clinician Version (SCID-5-CV)
. Arlington, VA: American Psychiatric Association; 2016.
15. Molinaro A. Diagnostic tests: how to estimate the positive predictive value. Pract
. 2015;2(4):162–166. doi:10.1093/nop/npv030.
16. Habibzadeh F, Habibzadeh P, Yadollahie M. On determining the most appropriate test cut-off value: the case of tests with continuous results. Biochem Med (Zagreb)
. 2016;297–307. doi:10.11613/bm.2016.034.
17. Unal I. Defining an optimal cut-point value in ROC analysis: an alternative approach. Comput Math Methods Med
. 2017;2017: 1–14. doi:10.1155/2017/3762651.
18. Kroenke K, Spitzer RL, Williams JB. The PHQ-9
: validity of a brief depression severity measure. J Gen Intern Med
19. Navinés R, Castellví P, Moreno-España J, et al. Depressive and anxiety disorders in chronic hepatitis C patients: reliability and validity of the Patient Health Questionnaire. J Affect Disord
. 2012;138(3):343–351. doi:10.1016/j.jad.2012.01.018.
20. Thompson AW, Liu H, Hays RD, et al. Diagnostic accuracy and agreement across three depression assessment measures for Parkinson’s disease. Parkinsonism Relat Disord
. 2011;17(1):40–45. doi:10.1016/j.parkreldis.2010.10.007.
21. Moher D, Shamseer L, Clarke M, et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst Rev
. 2015;4: 1. doi:10.1186/2046-4053-4-1.
22. Peterson MH, Barnason S, Donnelly B, et al. Choosing the best evidence to guide clinical practice: application of AACN levels of evidence. Crit Care Nurse
. 2014;34(2):58–68. doi:10.4037/ccn2014411.
23. de Man-van Ginkel JM, Hafsteinsdóttir T, Lindeman E, Burger H, Grobbee D, Schuurmans M. An efficient way to detect poststroke depression
by subsequent administration of a 9-item and a 2-item Patient Health Questionnaire. Stroke
. 2012;43(3):854–856. doi:10.1161/strokeaha.111.640276.
24. de Man-van Ginkel JM, Gooskens F, Schepers VP, Schuurmans MJ, Lindeman E, Hafsteinsdóttir TB. Screening
for poststroke depression
using the Patient Health Questionnaire. Nurs Res
. 2012;61(5):333–341. doi:10.1097/nnr.0b013e31825d9e9e.
25. Prisnie JC, Fiest KM, Coutts SB, et al. Validating screening
tools for depression in stroke
and transient ischemic attack patients. Int J Psychiatry Med
. 2016;51(3):262–277. doi:10.1177/0091217416652616.
26. Turner A, Hambridge J, White J, et al. Depression screening
: a comparison of alternative measures with the Structured Diagnostic Interview for the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (major depressive episode) as criterion standard. Stroke
27. Vermeer J, Rice D, McIntyre A, Viana R, Macaluso S, Teasell R. Correlates of depressive symptoms in individuals attending outpatient stroke
clinics. Disabil Rehabil
. 2016;39(1):43–49. doi:10.3109/09638288.2016.1140837.
28. Williams LS, Brizendine EJ, Plue L, et al. Performance of the PHQ-9
as a screening
tool for depression after stroke
. 2005;36(3):635–638. doi:10.1161/01.str.0000155688.18207.33.
29. Meader N, Moe-Byrne T, Llewellyn A, Mitchell AJ. Screening
for poststroke major depression: a meta-analysis of diagnostic validity studies. J Neurol Neurosurg Psychiatry
. 2014;85(2):198–206. doi:10.1136/jnnp-2012-304194.
30. Yacoub HA, Al-Qudah ZA, Khan HM, Farhad K, Ji AB, Souayah N. Trends in outcome and hospitalization cost among adult patients with acute ischemic stroke
in the United States. J Vasc Intern Neurol