Secondary Logo

Journal Logo

Original Articles

Evidence-based Evaluation of Magnetic Resonance Imaging as a Diagnostic Tool in Dementia Workup

Wahlund, Lars-Olof PhD, MD*; Almkvist, Ove PhD*; Blennow, Kaj PhD, MD; Engedahl, Knut PhD, MD§; Johansson, Aki PhD; Waldemar, Gunhild PhD, MD; Wolf, Henrike PhD, MD

Author Information
Topics in Magnetic Resonance Imaging: December 2005 - Volume 16 - Issue 6 - p 427-437
doi: 10.1097/01.rmr.0000245463.36148.12
  • Free


The diagnostic workup and management of a patient with cognitive complaints and symptoms are a multidisciplinary task, involving physicians from several medical specialties, including radiologists. A diagnostic evaluation should be initiated for all patients with symptoms that appear to either persist or worsen, as well as patients in whom the complaints are associated with other cognitive or behavioral changes, or with impaired activities of daily living. Diagnostic evaluation should be considered even when symptoms are not sufficiently severe to fulfill the international criteria for dementia, given that patients with mild symptoms may have potentially reversible conditions that need appropriate management.

The dementia syndrome is diagnosed using specific criteria, such as Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition1 or International Statistical Classification of Diseases, 10th Revision.2 There are no specific diagnostic markers for the most common dementia disorders. Thus, the specific underlying diseases that cause cognitive impairment and dementia are diagnosed using operational clinical diagnostic criteria.3

The diagnostic workup uses many sources of information, the patient's history being the most important. The medical history obtained from the patient should be supplemented by information from relatives or other informants. When possible, a number of investigations are usually recommended in the diagnostic workup of patients.3

  • physical, including neurological, examination
  • neuropsychological assessment
  • psychiatric/behavioral assessment
  • evaluation of activities of daily living
  • laboratory screening (blood) tests and electrocardiogram
  • neuroimaging methods (magnetic resonance imaging [MRI] or computed tomography [CT])

Structural imaging (CT and MRI) is used to find secondary disorders, such as tumors, subdural hematomas, and normal pressure hydrocephalus. Structural imaging has gradually become more important as the quality of the images has improved. That is obvious from several guidelines published lately by the American Association of Neurology4,5 and the European Federation of Neurological Societies.3 Both guidelines strongly recommend CT or MRI in the diagnostic workup to show specific abnormalities. The recently revised European Federation of Neurological Societies guidelines3 recommended MRI to increase specificity in cases of diagnostic uncertainty for which there is suspicion of Alzheimer disease (AD).

The introduction of MRI created opportunities to study more subtle brain structures, such as the hippocampus (HC) and entorhinal cortex (EC). Much research has been conducted on such brain structures in relation to dementia, specifically AD. The reason is that the HC and EC are related to the very early pathological changes associated with AD. Thus, 1 important question is whether atrophy in these areas could serve as diagnostic markers for AD. The Diagnosis in Evidence-based Dementia Practice of Qizilbash et al,6 published in 2002, points out the lack of systematic analyses of diagnostic methods (likelihood ratios, predictive values, sensitivity, and specificity) for use in dementia workups.


This review focuses on MRI in the diagnostic evaluation of patients presenting with cognitive complaints and symptoms. The aim is to explore the role and validity of MRI in differentiating AD from healthy aging and other dementias (OD).


Selection of Articles

To study MRI as a diagnostic test, we used evidence-based medicine techniques and searched for relevant literature with the following strategy:


Articles published in 1980 to 2004 that yielded 434 hits.

To select relevant articles for review, we formulated the following inclusion and exclusion criteria.

Inclusion Criteria for Articles

  • The articles included in this study had to describe at least 20 cases and 20 controls, or at least 30 cases (in studies for which controls are not appropriate).
  • The patients had to have been appropriately examined for dementia, including physical and psychiatric examinations, cognitive tests, blood tests, and imaging of the brain.
  • The patients had to have been diagnosed in accordance with well-known and standardized clinical or neuropathologic criteria.
  • Appropriate statistical methods had to have been used. Information had to be available to calculate the test's sensitivity, specificity, and likelihood ratios.
  • The articles had to have been written in English and published between 1980 and July 2004.

Exclusion Criteria

Because of differences in criteria regarding the selection of studies, meta-analysis articles were excluded.

Quality Assessment of the Studies

The articles were quality classified according to (a) the design of the study, (b) selection of patients, (c) control and contrast groups, and (d) the setting in which they were carried out, such as university hospitals, memory clinics, or outpatient clinics.6

Histopathology as the diagnostic gold standard was not present in most studies. Clinical diagnosis according to specific criteria was used as a "surrogate" gold standard. That was consistent with the suggestion of Qizilvast et al in Diagnosis in Evidence-based Dementia Practice.6

The articles representing the highest class of study quality were defined as Ia articles. They described prospective studies on a broad spectrum of patients and controls (population-based studies and consecutive series of a broad spectrum of patients) who were followed up with clinical diagnostic assessments over time and examined postmortem. The lowest class of evidence referred to 2b articles from cross-sectional studies on highly selected patients and controls (Table 1).

Classification of Study Quality

To explain how the results are presented, we must clarify some important concepts. The results are presented as the diagnostic test's sensitivity, specificity, and likelihood ratio for a positive (LR+) or negative (LR) result. The reason that results are presented with likelihood ratios rather than negative and positive predictive values is that likelihood ratio is a robust measure independent of prevalence rates in the tested populations.

Below is a presentation of the concept of sensitivity, specificity, and likelihood ratios described in relation to pretest and posttest probability.

Sensitivity and Specificity

Sensitivity is defined as a test's probability of finding a disease in a person with the target disease. Specificity is defined as a test's probability of finding a normal person without the target disease.

Pretest Probability

The clinician's impression of a patient is of importance for the pretest probability of a disease, dementia in this case. By means of an interview with the patient and a caregiver, an experienced clinician obtains information that makes the pretest probability of dementia very high. Because a less experienced clinician cannot take equal advantage of information from such an interview, the pretest probability will be lower. In the absence of information about the symptoms of a disease, the pretest probability will be equal to the prevalence of the disease in the person's age cohort. For instance, the pretest probability of dementia in an unexamined and unselected population of a cohort of people aged 50 years will be very low, less than 1%, whereas the pretest probability among elderly older than 80 years admitted to a memory clinic because of memory complaints will be very high, more than 50%. In the first case, powerful tests are needed to detect dementia, whereas less powerful tests may be beneficial among the elderly with memory complaints.

Posttest Probability

The posttest probability is the pretest probability modified (increased or decreased) by a diagnostic test.

Likelihood Ratio

The likelihood ratio for a positive test result (LR+) is defined as the probability of a positive result in a person with the target disease/probability of a positive result in a person without the target disease. That is equivalent to the ratio of true-positive results to false-positive results = sensitivity/(100% specificity).

The likelihood of a negative test result (LR) is defined as the probability of a negative result in a person with the target disease/probability of a negative result in a personwithout the target disease. That is equivalent to the ratio of false-negative results/true-negative results = (100% sensitivity)/specificity.

In other words, likelihood ratio is a test's discriminatory power and indicates the degree to which the pretest probability will be increased or decreased. There are some practical guidelines for evaluating the power of LR+ and LR (Table 2).

The Effect of Positive and Negative Likelihood Ratios on Pretest Probability

Classification of Evidence

The classification of evidence was predetermined in the following way. First, we defined "general criteria" that had to be fulfilled to be assigned the highest class of evidence. Then the number of high-quality articles determined the class of evidence (1 of 4) to which the specific article (study) should belong. That is described in Table 3.

Definition of Evidence Grades


After the reviewing procedure, we selected 36 articles relevant to MRI. The primary reasons for exclusion were (a) too few subjects included, (b) insufficient description of investigative procedure or diagnostic criteria, (c) insufficient description of statistical methods, or (d) irrelevance.

The accepted articles that were evaluated are presented in detail in Table 4. LR+ and LR are presented based on the sensitivity and specificity values given in the text of the reviewed article.

Reviewed Articles Listed According to Study Quality

The breakdown of articles according to quality is shown in Table 5.

Breakdown of the Reviewed Articles According to Quality of Study

Generally speaking, the articles presented a limited number of cases, with a strong focus on AD versus controls. Few studies were from a longitudinal prospective point of view. The main brain structures studied were parts of the medial temporal lobe. That is most evident in articles from 1990 until the present, probably reflecting the increased use of MRI, which allows for better visualization of the medial temporal lobes than CT.

Contrast Groups

The most common contrast was AD compared with healthy controls (20 studies). "Other dementias," defined as vascular dementia, frontotemporal lobar degeneration, and dementia with Lewy bodies, was contrasted with AD in 5 studies. Finally, mild cognitive impairment (MCI) was compared with both healthy controls and AD in 6 studies.

One study reported on Creutzfeldt-Jakob disease and variant Creutzfeldt-Jakob disease and the pulvinar sign.27 Pulvinar sign is the increased signal intensity found in the pulvinar part of thalamus in cases with variant Creutzfeldt-Jakob disease. That yields a specificity value of 1.0 with an infinite positive likelihood ratio.

Methods for Estimation of Brain Volumes

A variety of methods were used for assessing atrophy in both the whole brain and local brain structures. Visual rating, linear measurements, and volumetry based on segmentation and/or manual outlining were presented (Table 6). It was difficult to find systematic use of any particular method, but volumetry was most often presented (17 of 36 articles).

Methods Used for Estimation of Regional Brain and/or Brain Volumes Presented in the Reviewed Articles

Volume measurements were not systematically related to a specific brain structure, but areas of the medial temporal lobes were most often reported, with HC and EC being the most common.

Cutoff values were given in a minority of the studies (References 16, 17, 42). The sensitivity and specificity values were usually extracted from discriminant analysis models. Thus, it was difficult to report cutoff values for 1 specific structure (which would have been important for practical clinical reasons).

Visual rating was used in 6 studies (References 8, 18, 19, 24, 32, and 34). Three of them used the same rating scale (References 18, 19, and 24). Patients with AD were contrasted with controls and subjects with bipolar disorders, as well as patients with DBL and OD. Generally, a large variation in LR+/− values was found. When AD was compared with controls, high values were reported (LR+ of 7-42).

Likelihood Ratio Values

Eleven studies compared AD with healthy controls. Table 7 shows the results of evaluating the EC and HC.

Likelihood Ratios of Magnetic Resonance Imaging-based Volumetrics

Hippocampus was reported in 11 studies. LR+ values ranged from 3 to 17, whereas LR values ranged from 0.11 to 0.91.

Evaluation of EC was reported in 4 studies, with LR+ ranging from 4 to 33 and LR ranging from 0.21 to 076.

Nine of the 11 studies fulfilled the general criteria for evidence (sensitivity, >80%; specificity, >80%; LR+, >5). From Table 7A, it is obvious that there is moderately strong evidence (grade 2) that the volume of HC contributes to distinguishing AD patients from healthy controls.

None of the 4 studies on the EC were assigned sufficient study quality. We found that the EC contributes to distinguishing AD from controls-limited evidence (grade 3).

Six studies compared OD with AD (Table 8).

Likelihood Ratio of Magnetic Resonance Imaging-based Volumetrics

A mixture of volumetry, visual rating, and linear measurement was presented. The LR+ values ranged from 41 to 1.5, whereas the LR values ranged from 0.24 to 0.9. The study qualities were 2a and 2b. No reliable conclusions could be drawn.

Mild cognitive impairment was evaluated in 6 studies (Table 9). Mild cognitive impairment was contrasted with healthy controls and AD. Entorhinal cortex and HC were used for volume measurements. LR+ values ranged from 2 to 13, whereas LR values ranged from 0.21 to 0.84.

Likelihood Ratios of Magnetic Resonance Imaging-based Volumetry, Mild Cognitive Impairment Versus Controls and Mild Cognitive Impairment Versus Alzheimer Disease Are Presented

There was limited evidence (grade 3) that analyses of hippocampal volumes contribute to distinguishing AD from MCI.

White Matter Changes

By definition, a diagnosis of vascular dementia implies the presence of white matter changes. We did not find any articles addressing white matter changes or infarcts as a diagnostic tool to diagnose VD (or any other dementia).


Most articles investigated hospital populations, and only 3 studies took a population-based approach (References 15, 16, and 17).

Circularity Bias

Very few studies explicitly stated that the evaluation of the method was performed independently of the imaging results (References 15 and 18). Because a structural image is usually routine in clinical workup, we assumed that the risk of dependence between study results and diagnostic procedures was high.

A limited number of research groups have contributed to the scientific literature, and several authors have more than 1 article presented. Whether the same population was used inthe various studies was not obvious from the individual articles.


This review investigated the usefulness of MRI as a diagnostic tool to differentiate AD from other dementia disorders and healthy aging. Thirty-six articles were selected from more than 400. We investigated the sensitivity, specificity, and likelihood ratios of the MRI tests. A variety of methods for volume estimation were used, while many different brain areas were measured. Thus, it was difficult to evaluate a specific volume method in a particular brain area. However, we were able to extract information from volumetry of the HC and EC to distinguish between AD patients and controls.

We found moderately strong evidence (grade 2) that atrophy of the HC, estimated with MRI-based volumetry, contributes to the diagnostic workup that differentiates AD patients from controls. The conclusion is based on studies that used clinical diagnosis as the gold standard and those that were conducted at memory clinics. The importance of the setting-primary care or a highly specialized unit of a university hospital-must be emphasized. In primary care, patients are more similar to the population in terms of background characteristics, whereas patients at university clinics tend to be highly selected by means of a referral procedure. This difference placed varying demands on the diagnostic instruments.

In this review, most studies were based on quality 2b, reflecting populations from memory clinics, usually at university hospitals. No study could be referred to a general practice setting. Of the 3 studies with quality 1a, one was population based, whereas the two other were at university hospitals with heterogeneous study populations. Two of them investigated AD patients versus controls, whereas two investigated OD patients versus controls. Both comparisons yielded high LR+ values. We cannot extrapolate these findings into general practice settings, given that we found no studies that used medial temporal lobe atrophy or any other brain structure as a diagnostic method in such settings.

It is interesting to note that visual rating resulted in the same magnitude of LR+/− values as volumetry. That might imply that a clinically useful method (fast and easy to apply) for estimating medial temporal lobe atrophy is as good as more complicated and "unclinical" methods such as volumetry for distinguishing AD patients from healthy controls.

Cutoff values were rarely given together with sensitivity and specificity. Thus, it was not possible to investigate the influence of various cutoff values on LR+. That represents a major limitation, given that cutoff values of volumes based on data from relevant normal subjects are needed in clinical radiological practice. Otherwise, the clinical usefulness of MRI as a diagnostic tool is not present.

Many of the studies provided no information as to whether the results of the "MRI test" were assessed independently of the outcome. The question of workup bias (unreliable results because of the outcome of the imaging method having influenced the diagnostic procedure) must be considered. Knowledge about predictor results may have influenced the diagnosis, thereby confounding the estimation of sensitivity, specificity, and likelihood ratio.


For diagnostic workups performed at specialized settings, MRI-based evaluation of atrophy of medial temporal lobe structures (HC) contributes to diagnostic accuracy.

These findings have not yet been translated into clinical radiological practice. There is a need for reliable cutoff values obtained from large population-based cohorts of healthy elderly subjects. There is a lack of studies that provide information concerning these MRI-based volume evaluations at the general practice level, where the pretest probability of dementia is low and the diagnostic accuracy of the clinical criteria is also lower. Perhaps MRI should be used primarily in specialized settings, such as memory clinics.


The authors thank Dr Olof Edhag and Dr Anders Norlund. The authors also thank Anette Eidehall for her invaluable help with the manuscript.


1. Association AP. DSM-IV, Diagnostic and Statistical Manual of Mental Disorders. Washington, DC: American Psychiatric Association; 1994.
2. World Health Organization. ICD-10, World Health Organization Tenth Revision of the International Classification of Diseases. Geneva: WHO; 1992.
3. Waldemar G, Dubois B, Emre M, et al. Diagnosis and management of Alzheimer's disease and other disorders associated with dementia. The role of neurologists in Europe. European Federation of Neurological Societies. Eur J Neurol. 2000;7(2):133-144.
4. Practice parameter for diagnosis and evaluation of dementia (summary statement). Report of the Quality Standards Subcommittee of the American Academy of Neurology. Neurology. 1994;44(11):2203-2206.
5. Knopman DS, DeKosky ST, Cummings JL, et al. Practice parameter: diagnosis of dementia (an evidence-based review). Report of the Quality Standards Subcommittee of the American Academy of Neurology. Neurology. 2001;56(9):1143-1153.
6. Qizilbash N, Schneider L, Chui H, et al. Diagnosis in Evidence-based Dementia Practice. Oxford, UK: Blackwell Publishing Co; 2002:81-85.
7. Killiany RJ, Hyman BT, Gomez-Isla T, et al. MRI measures of entorhinal cortex vs hippocampus in preclinical AD. Neurology. 2002;58(8):1188-1196.
8. de Leon MJ, Golomb J, George AE, et al. The radiologic prediction of Alzheimer disease: the atrophic hippocampal formation. AJNR Am J Neuroradiol. 1993;14(4):897-906.
9. Xu Y, Jack CR Jr, O'Brien PC, et al. Usefulness of MRI measures of entorhinal cortex versus hippocampus in AD. Neurology. 2000;54(9):1760-1767.
10. Laakso M, Soininen H, Partanen K, et al. The interuncal distance in Alzheimer disease and age-associated memory impairment. AJNR Am J Neuroradiol. 1995;16(4):727-734.
11. Laakso MP, Partanen K, Lehtovirta M, et al. MRI of amygdala fails to diagnose early Alzheimer's disease. Neuroreport. 1995;6(17):2414-2418.
12. El Fakhri G, Kijewski MF, Johnson KA, et al. MRI-guided SPECT perfusion measures and volumetric MRI in prodromal Alzheimer disease. Arch Neurol. 2003;60(8):1066-1072.
13. Kantarci K, Xu Y, Shiung MM, et al. Comparative diagnostic utility of different MR modalities in mild cognitive impairment and Alzheimer's disease. Dement Geriatr Cogn Disord. 2002;14(4):198-207.
14. Bottino CM, Castro CC, Gomes RL, et al. Volumetric MRI measurements can differentiate Alzheimer's disease, mild cognitive impairment, and normal aging. Int Psychogeriatr. 2002;14(1):59-72.
15. Pennanen C, Kivipelto M, Tuomainen S, et al. Hippocampus and entorhinal cortex in mild cognitive impairment and early AD. Neurobiol Aging. 2004;25(3):303-310.
16. Jobst KA, Barnetson LP, Shepstone BJ. Accurate prediction of histologically confirmed Alzheimer's disease and the differential diagnosis of dementia: the use of NINCDS-ADRDA and DSM-III-R criteria, SPECT, x-ray CT, and Apo E4 in medial temporal lobe dementias. Oxford Project to Investigate Memory and Aging. Int Psychogeriatr. 1998;10(3):271-302.
17. Rossi R, Joachim C, Smith AD, et al. The CT-based radial width of the temporal horn: pathological validation in AD without cerebrovascular disease. Int J Geriatr Psychiatry. 2004;19(6):570-574.
18. Wahlund LO, Julin P, Johansson SE, et al. Visual rating and volumetry of the medial temporal lobe on magnetic resonance imaging in dementia: a comparative study. J Neurol Neurosurg Psychiatry. 2000;69(5):630-635.
19. O'Brien JT, Desmond P, Ames D, et al. Temporal lobe magnetic resonance imaging can differentiate Alzheimer's disease from normal ageing, depression, vascular dementia and other causes of cognitive impairment. Psychol Med. 1997;27(6):1267-1275.
20. Lavenu I, Pasquier F, Lebert F, et al. Association between medial temporal lobe atrophy on CT and parietotemporal uptake decrease on SPECT in Alzheimer's disease. J Neurol Neurosurg Psychiatry. 1997;63(4):441-445.
21. Denihan A, Wilson G, Cunningham C, et al. CT measurement of medial temporal lobe atrophy in Alzheimer's disease, vascular dementia, depression and paraphrenia. Int J Geriatr Psychiatry. 2000;15(4):306-312.
22. Varma AR, Adams W, Lloyd JJ, et al. Diagnostic patterns of regional atrophy on MRI and regional cerebral blood flow change on SPECT in young onset patients with Alzheimer's disease, frontotemporal dementia and vascular dementia. Acta Neurol Scand. 2002;105(4):261-269.
23. O'Brien JT, Metcalfe S, Swann A, et al. Medial temporal lobe width on CT scanning in Alzheimer's disease: comparison with vascular dementia, depression and dementia with Lewy bodies. Dement Geriatr Cogn Disord. 2000;11(2):114-118.
24. Barber R, Gholkar A, Scheltens P, et al. Medial temporal lobe atrophy on MRI in dementia with Lewy bodies. Neurology. 1999;52(6):1153-1158.
25. Kitagaki H, Mori E, Hirono N, et al. Alteration of white matter MR signal intensity in frontotemporal dementia. AJNR Am J Neuroradiol. 1997;18(2):367-378.
26. Shonk TK, Moats RA, Gifford P, et al. Probable Alzheimer disease: diagnosis with proton MR spectroscopy. Radiology. 1995;195(1):65-72.
27. Zeidler M, Sellar RJ, Collie DA, et al. The pulvinar sign on magnetic resonance imaging in variant Creutzfeldt-Jakob disease. Lancet. 2000;355(9213):1412-1418.
28. Bigler ED, Tate DF. Brain volume, intracranial volume, and dementia. Invest Radiol. 2001;36(9):539-546.
29. Gosche KM, Mortimer JA, Smith CD, et al. Hippocampal volume as an index of Alzheimer neuropathology: findings from the Nun Study. Neurology. 2002;58(10):1476-1482.
30. O'Brien JT, Ames D, Desmond P, et al. Combined magnetic resonance imaging and single-photon emission tomography scanning in the discrimination of Alzheimer's disease from age-matched controls. Int Psychogeriatr. 2001;13(2):149-161.
31. Laakso MP, Soininen H, Partanen K, et al. MRI of the hippocampus in Alzheimer's disease: sensitivity, specificity, and analysis of the incorrectly classified subjects. Neurobiol Aging. 1998;19(1):23-31.
32. Erkinjuntti T, Lee DH, Gao F, et al. Temporal lobe atrophy on magnetic resonance imaging in the diagnosis of early Alzheimer's disease. Arch Neurol. 1993;50(3):305-310.
33. Laakso MP, Hallikainen M, Hanninen T, et al. Diagnosis of Alzheimer's disease: MRI of the hippocampus vs delayed recall. Neuropsychologia. 2000;38(5):579-584.
34. O'Brien JT, Desmond P, Ames D, et al. The differentiation of depression from dementia by temporal lobe magnetic resonance imaging. Psychol Med. 1994;24(3):633-640.
35. Juottonen K, Laakso MP, Insausti R, et al. Volumes of the entorhinal and perirhinal cortices in Alzheimer's disease. Neurobiol Aging. 1998;19(1):15-22.
36. Frisoni GB, Beltramello A, Geroldi C, et al. Brain atrophy in frontotemporal dementia. J Neurol Neurosurg Psychiatry. 1996;61(2):157-165.
37. Golebiowski M, Barcikowska M, Pfeffer A. Magnetic resonance imaging-based hippocampal volumetry in patients with dementia of the Alzheimer type. Dement Geriatr Cogn Disord. 1999;10(4):284-288.
38. DeCarli C, Murphy DG, McIntosh AR, et al. Discriminant analysis of MRI measures as a method to determine the presence of dementia of the Alzheimer type. Psychiatry Res. 1995;57(2):119-130.
39. Frisoni GB, Geroldi C, Beltramello A, et al. Radial width of the temporal horn: a sensitive measure in Alzheimer disease. AJNR Am J Neuroradiol. 2002;23(1):35-47.
40. Koslow SA, Swihart AA, Latchaw RE, et al. Quantitative computer tomography in Alzheimer's disease: a re-evaluation. Gerontology. 1992;38(3):174-184.
41. Du AT, Schuff N, Zhu XP, et al. Atrophy rates of entorhinal cortex in AD and normal aging. Neurology. 2003;60(3):481-486.
42. Gao FQ, Black SE, Leibovitch FS, et al. Linear width of the medial temporal lobe can discriminate Alzheimer's disease from normal aging: the Sunnybrook Dementia Study. Neurobiol Aging. 2004;25(4):441-448.

dementia workup; Alzheimer disease; magnetic resonance imaging; sensitivity; specificity; likelihood ratios; evidence-based medicine; diagnosis; medial temporal lobe

© 2005 Lippincott Williams & Wilkins, Inc.