In clinical research, studies can be classified into descriptive and analytic studies. Descriptive studies are always observational studies and describe general disease characteristics related to person, place, and time. They include cross-sectional studies, surveillance studies, ecological correlational studies, case reports, and case series1. Analytic studies test a hypothesis about a causal relation between exposure and outcome2. They can be observational, such as case-control and cohort studies, or controlled, such as the randomized controlled trial. In the randomized controlled trial, the intervention or treatment is randomly assigned by the researcher, while in observational studies, the surgeon and/or the patient decide on which treatment is given, as happens in routine care.
The results of randomized controlled trials are considered the highest level of evidence because randomization controls for prognostic factors between two comparison groups, thereby minimizing the role of confounding bias and optimizing the internal validity. In the hierarchy of evidence, randomized controlled trials (Level I) are followed by cohort studies (Level II), case-control studies (Level III), case series (Level IV), and expert opinion (Level V)2.
Even though a randomized controlled trial carries the highest level of evidence (Level I), for technical or ethical reasons, it is not an appropriate design for all clinical questions2-4. For example, in fracture-healing research, it would be unethical to randomize patients with grossly contaminated open tibial shaft fractures to either early or delayed treatment. In these situations, observational studies are the best alternative to study the efficacy and safety of a certain intervention.
Cohort studies, case-control studies, and case series are all types of observational studies. Cohort and case-control studies differ from case series in that they make use of a comparison group in the analysis of the treatment effect on the outcomes. Case series belong to a group of descriptive studies that do not test the hypothesis of treatment efficacy4. A case series follows a group of patients who have a similar diagnosis or who are undergoing the same procedure over a certain period of time.
Although case series have methodological limitations with regard to making causal inferences about the relation between treatment and outcome, they can be helpful in generating a hypothesis that can be tested in further analytic studies2. This article will discuss the main purposes and the major strengths and limitations of a case series. Principles are provided for its design, analysis, and report.
When to Consider a Case Series?
Despite its negligible role in assessing treatment efficacy, a case series is suitable for a more cautious description of interventions in several settings. First, case series serve as a means of initially reporting on novel diagnostic or therapeutic strategies, particularly when the option of waiting for comparative evidence is considered unacceptable. Second, patient registries can be a tool for summarizing the outcomes in a certain patient category (for example, all patients with tibial fractures)4.
The primary purpose of a case series should be the generation of hypotheses that subsequently can be tested in studies of greater methodological rigor. Put simply, a case series can be seen as a screening tool for sensible hypotheses that are worthy of further examination. Treatment safety and diagnostic accuracy are the principal outcomes that can be assessed fairly and reliably in a case series. In the assessment of either outcome, no control group is necessary and long-term follow-up can be obtained readily, especially in a retrospective design. Information regarding the characterization of disease patterns in terms of natural history, recovery, and prognostic factors may help a researcher determine the sample size, relevant covariates, and length of follow-up that should be used in a subsequent randomized controlled trial5. Rather than being an antecedent of more valid research, case series may also be an alternative or a valuable adjunct4,6.
Strengths and Limitations of a Case Series
As with all study designs, case series have strengths and limitations (Table I).
TABLE I -
Strengths and Limitations of a Case-Series Design
||High external validity
|| No interference in treatment decision process
|| Wide range of patients
||Study conduct takes little time
||Lack of comparison group
||Data collection often incomplete
||Susceptible to bias
|| Selection bias
|| Measurement bias
In all observational studies, including case series, the study investigators do not control which intervention(s) the research participant receives. The advantage of this approach over a randomized controlled trial is that the study results are closer to those obtained in routine clinical practice and may therefore be considered more relevant5,7. A higher relevance (or external validity) means that the results can be better applied to clinical practice in other centers. Another advantage of not interfering in the treatment decision is that the surgeons are not forced to perform an operation with which they are less experienced.
The external validity is also high in a case series that includes a diverse range of patients. By including patients with different characteristics and co-interventions, the study sample is more likely to be representative of the population of interest. In a randomized controlled trial, however, relatively stringent inclusion criteria and selection of only those patients who wish to participate decrease the extent to which the results can be applied to common clinical practice8,9.
In contrast with randomized controlled trials, case series do not include a comparison group, nor is any form of randomization employed. The case series is therefore a relatively efficient and cost-saving design. In a case series, the decision about the treatment regimen remains with the surgeon and the patient. This is an obvious advantage when ethical considerations prohibit the randomization of a patient to a nontreatment group or to a treatment that would be inconsistent with common standards of orthopaedic practice (e.g., a placebo-treatment group).
The primary limitation of a case series is its lack of a comparison (control) group. A control group is a group of patients who share all of the characteristics of the patients of the treatment group except that they do not receive the treatment. When a study lacks a control group, no causal inferences should be made about the relationship between the treatment and the outcomes, since it is impossible to determine whether the outcomes are attributable to the treatment effect or to other patient characteristics. As a result, hypotheses can only be made about apparent relationships.
A case series is often based on retrospective observation of patient data. Conducting a case series prospectively (i.e., the hypothesis precedes data collection) or retrospectively (vice versa) makes a difference as to the extent of selection and measurement bias encapsulated in the observations. In contrast to the method of operation of a randomized controlled trial, rigorously developed protocols and a dedicated single investigator (for example, a research nurse) are absent in a retrospective design, which may decrease the completeness of inclusion, data collection, and patient follow-up. Additionally, data are not measured in a standardized way, thereby increasing the measurement bias. Further, the researcher is dependent on whatever outcomes have been measured in the course of routine care and cannot select a suitable and valid outcome measure as is possible in a prospective design7.
Similar to other observational study designs, case series are prone to bias. Selection bias occurs when follow-up data are less likely to be collected from patients with a better or worse outcome. For example, when the only subjects included in a study are those who are available for a follow-up of at least six months, patients who died or changed hospitals before six months are not included. These patients are likely to have had worse outcomes than the ones who survived, so the selection of patients is therefore biased.
Measurement bias can arise when different methods of outcome measurement are used in a study. Compared with case series, measurement bias might be more likely to occur in case-control and cohort studies, because the difference between the treatment and control groups makes it more likely for investigators to gather information in different ways.
Selection and measurement bias should theoretically occur as much in case series as in randomized controlled trials, because randomized controlled trials only control for confounding bias7. Confounding bias occurs in the presence of confounders, that is, factors that distort the true relationship of the study variable of interest by virtue of also being related to the outcome of interest10. Confounding bias is not present in case series for the simple reason that there is no control group. However, because the protocols associated with randomized controlled trials are typically better developed and of higher quality than those of case series with regard to methods to control for bias, selection and measurement bias may be less common in randomized controlled trials.
Criteria for a Good Case Series
Specific guidelines for planning, conducting, and reporting a case series are presented below1,4,5,7. We recommend reporting on how each of these guidelines was implemented when writing a paper on a case series11.
As with other study designs, the study question should be focused and appropriate for a case series. Specifically, the question should not be whether the investigated treatment is more effective or safer than another treatment. The study question should list (1) its study population, (2) the intervention, and (3) the primary outcome.
The case definition should mention inclusion and exclusion criteria, which should be based on widely used, preferably validated, definitions. If authors use their own criteria, definition and justification are necessary to enable the reader to compare the studied population with his or her own patients. A consecutive inclusion of patients optimizes external validity. It may be tempting to include patients seen over a large period of time to increase sample size. However, the use of a short inclusion period minimizes known and unknown changes over time in co-interventions, prognosis, and even in the intervention under study.
A detailed description of the intervention and the co-intervention should be stated. This will ensure repeatability of the study by other investigators. It is very important to thoroughly describe co-interventions (for example, postoperative mobilization and physical therapy), as these are not always standardized among study centers as is frequently implied by vague statements that patients received “standardized operative or postoperative care.” Additionally, indications for the studied treatment should be explained. This will primarily determine the consistency of the patient group.
The most important outcomes in care are those that measure patient satisfaction, relief of symptoms, and a feeling of well-being. An example is the Short Form-36 questionnaire, which not only measures physical function but also mental well-being12. Including only clinical measurements would not represent the subjective nature of patient care. Further, outcomes measurements should be valid and reliable. Validity refers to the degree to which the data measure what they were intended to measure. An example of a valid measure is the Western Ontario and McMaster Universities (WOMAC) Osteoarthritis Index, which correlates well with earlier established instruments of pain, stiffness, and physical function13. Reliability refers to the extent to which repeated measurements of a stable phenomenon get similar results. The general impression of radiographic healing of tibial fractures is an example of a measurement with a high interobserver agreement14.
The blinding of outcome assessors is ideal in every kind of research design and can be implemented quite usefully in case-series studies (e.g., by having some investigators collect data only on outcome and others collect data only on patient characteristics). This prevents the investigators' measurements from being influenced (intentionally or unintentionally) by their personal treatment preference.
The method of data acquisition (telephone interview, clinical measurement, or chart review) should be addressed in the study report for the sake of repeatability and the appraisal of measurement bias. A further criterion of a well-measured outcome is the minimal length of follow-up. Sufficient time should be given for complications to develop and be recorded. For example, a maximal follow-up time of six months in a study of health-related quality of life after a tibial fracture would underestimate the patients' eventual health status, since patients continue to improve for up to a year, and sometimes longer, after the occurrence of a fracture.
As the design of a case series is descriptive, only descriptive statistics should be used15. That is, no comparative tests yielding p values should be done. By describing summary statistics, the author errs on the conservative side of speculation and avoids misleading his or her colleagues with fancy probability statistics.
As case series have many methodological limitations, their findings should be described. First, a statement of the external validity of the obtained data should be given. This includes (1) patient characteristics and (2) completeness of follow-up. Additionally, the presence of chance and the presence, direction, and magnitude of bias should be acknowledged. Authors who only emphasize the resemblance to routine clinical practice greatly mislead their audience by withholding from them information on internal validity.
Patients may differ according to prognostic variables, such as age, etiology, and disease severity among geographical regions. This may complicate comparisons with other reports or explain discrepancies. For example, a relatively greater contribution of blunt trauma to all observed severe injuries in Europe may explain why the most predictive trauma score differs from that which can be found in North American studies16. The follow-up rates and reasons for loss to follow-up should be stated. Completeness of follow-up varies considerably among similar case series9, making it difficult for readers to compare them. Therefore, authors should be cautious when interpreting their own results in relation to results of apparently similar case series.
Most importantly, no absolute conclusions on the studied treatment should be stated. As mentioned before, the lack of a comparison group prohibits any hypothesis from being tested. Valid conclusions basically repeat the descriptive study findings, for example, “our patients treated by treatment X showed good outcome Y after Z months of follow-up.” Stating that “treatment X is better than treatment Y” or even that “treatment X is effective” would be invalid. Inferring such a finding could lead a reader to use treatment X in the care of a patient, a decision that would seem to be evidence-based but that in reality is not and could lead to an unfavorable outcome.
Example 1: A Case Series Followed by Research with Higher Level of Evidence
Extramedullary fixation with a sliding hip screw has shown good clinical results in the treatment of pertrochanteric fractures. However, because of some reported complications, the intramedullary gamma nail, which theoretically has biomechanical advantages over a sliding hip screw, was developed17-19.
Chevalley and Gamba presented a case series of sixty-three patients who were treated for a pertrochanteric hip fracture with a gamma nail20. Data for this study were collected prospectively, and patients were followed until the end of treatment for an average of 7.2 months. The quality of reduction was classified, on the basis of the radiographic fracture gap, as good in thirty-eight patients, acceptable in nineteen patients, and unsatisfactory in six patients. On the average, full weight-bearing was allowed by the twenty-fourth day after the procedure, and fracture consolidation was seen radiographically at an average of 3.8 months. In their discussion, the authors state their belief that use of the gamma nail avoids delays in full weight-bearing, shortens the procedure to treat unstable intertrochanteric fractures, and diminishes the risk of nonunion.
Several strengths are recognized in the design of this case series. First, the prospective nature of the study made it more likely that the data that were collected would be complete. Also, consecutive patients were included in this series. Selection bias was limited because all patients who received treatment with gamma nails in the observed period were included in the study. The authors justly did not overstep the goal of a case series, as their goal was to present their results with gamma nailing. However, they did not describe their main outcomes of interest in their methods section and did not describe the criteria that were used to assess the outcomes. Because patients were followed until the end of treatment, this study is complete with regard to the report for potential complications associated with the treatment. However, no definition of the end of treatment was given by the investigators. The quality of reduction was assessed on the basis of classification of fracture-gap size as determined by the investigators. The authors did not justify this decision; however, they did define the categories. The postoperative results are therefore easily interpreted by readers who may not be familiar with this type of classification.
No causal inferences are made, as the authors do not claim that gamma nails are superior to sliding hip-screw systems in general and include the words “in our experience” in their conclusion. Also, results from other series are mentioned in the conclusion, but the authors acknowledge that comparison with these series would not be appropriate.
Because a lack of consensus exists among observational studies regarding the optimal operative treatment of pertrochanteric fractures, further studies with a higher level of evidence were needed. Several randomized controlled trials were performed, comparing gamma-nail fixation with sliding hip-screw fixation, and a recent meta-analysis summarized the evidence from these trials21. The primary conclusion from this meta-analysis was that there was no advantage of gamma nail fixation over sliding hip-screw fixation with regard to mortality, nonunion, and complications. As a result, surgeons are now better prepared to make an evidence-based decision on the treatment of pertrochanteric fractures.
Example 2: Treatment Evaluation by Functional Outcome
Agreement exists on the need for operative treatment of tibial avulsion fractures of the posterior cruciate ligament, but not on the optimal surgical approach22,23. Nicandri et al. recently presented a retrospective case series on patients who underwent a modified open posterior approach24. Their purpose was to explore long-term functional status following this new procedure, including clinical (Musculoskeletal Function Assessment questionnaire, posterior drawer test, and knee range of motion) and radiographic (fracture-healing) outcomes.
In their introduction, the authors stated that an open procedure was indicated for some patients with posterior cruciate ligament avulsion, but they did not specify their inclusion criteria. As a result, trauma surgeons are unable to appreciate the clinical relevance.
The operative procedure and the postoperative treatment were very extensively described, and the clinical outcomes were all validated (except for the knee range of motion, which was measured with use of the commonly used goniometer). In contrast, no definition of radiographic healing was given. While there is controversy regarding the best method of monitoring radiographic healing25, at least one accepted method should have been used.
In this case series, the external validity is compromised because of the retrospective study design. From an already small sample, eight of eighteen patients were eliminated because of missing data.
The average musculoskeletal functional assessment score was 14, which is favorable. Two patients had grade-II laxity, and eight patients had grade-I laxity, which is favorable as well.
Only the p values of tests comparing the range of motion of the affected and the contralateral knees were presented, rather than raw data on affected knee range of motion. Since the sample size is very small, no significant differences could be detected, falsely suggesting that knee functions were intact.
In their discussion, the authors used their data to recommend the use of early postoperative mobilization. Although the authors mentioned the limitations of their case series, they concluded that their operative technique “results in good clinical, radiographic and functional outcomes.” Although the clinical results are believable, it is not clear to which type of patients they apply. In addition, a substantial number of patients who underwent the procedure were excluded. Therefore, and because no control group was used, the authors' conclusion that the outcomes were the result of their surgical approach is invalid.
Example 3: A Case Series Designed to Assess the Safety of an Intervention
Operative fixation of displaced supracondylar humeral fractures in children is performed to maintain anatomical reduction and to minimize the complication rate associated with nonoperative management. The best approach to Gartland26 type-II fractures remains controversial.
Skaggs et al. conducted a case series in which the purpose was to determine the complication rate, both surgical and anesthetic, of the operative treatment (closed reduction and percutaneous pinning) of type-II supracondylar fractures in children27. Only patients with complete medical records and a minimum follow-up of six weeks were included. Data were acquired from a review of radiographs and clinical notes, and patients were classified as excellent, good, fair, or poor on the basis of the criteria described by Flynn et al.28. According to these criteria, 181 patients had excellent results, six had good results, and two had fair results. There were no surgical or anesthetic complications noted in this series.
The primary limitation of this case series was the short follow-up period. This case series only used a minimum of six weeks of follow-up, which would be far too short a time for the occurrence of some surgical complications (such as nonunion) that were assigned as outcomes in this series.
Furthermore, the authors were not consistent in stating the purpose of their study. While they emphasized determination of the complication rate in their introduction and discussion, their overall recommendations also contained suggestions about the efficacy of the treatment (“satisfactory outcome”). Also, selection bias might be present, since only patients with complete medical records and a minimum of six weeks of follow-up were included.
In the methods section, the authors provided a detailed description of the data that were collected and they provided a reference for the Flynn classification system. As in all case series, the type of treatment was chosen by the surgeon, and, as a result, different pin configurations were used for fixation. These different configurations were properly presented in a table, which makes it easy for the readers to judge if this series is applicable to their own practice.
The authors reported on the results of another series of similar fractures in children who were treated without pinning, and they used this group as a control group to conclude that initial operative treatment in their series led to a higher chance of satisfactory outcome without an increased risk of complications. However, results from different case series cannot be compared, and such conclusions, as in this example, cannot be properly drawn.
Although a case series is lower than a randomized controlled trial in the hierarchy of evidence, it could be useful when use of a randomized controlled trial is not appropriate or possible. Even though no causal inferences can be made, a case series is a good way to generate new hypotheses about treatment efficacy and to assess information about the safety and diagnostic accuracy of a treatment. Also, the external validity of a case series often exceeds that of a randomized controlled trial. However, keeping the methodological limitations of a case series in mind, one should be careful in applying its conclusions to clinical practice before more evidence is obtained from randomized trials. The ideal case series would have a prospective design, would contain a clear definition of its population, the intervention, the outcomes, and the amount of follow-up, and would not make any causal inferences about the treatment effect. When designed and conducted in the right way, a case series can be a sensible alternative to studies with higher levels of evidence, with the additional advantage of saving a lot of time and money.
1. Grimes DA, Schulz KF. Descriptive studies: what they can and cannot do. Lancet. 2002;359:145-9.
2. Brighton B, Bhandari M, Tornetta P 3rd, Felson DT. Hierarchy of evidence: from case reports to randomized controlled trials. Clin Orthop Relat Res. 2003;413:19-24.
3. McCulloch P, Taylor I, Sasako M, Lovett B, Griffin D. Randomised trials in surgery: problems and possible solutions. BMJ. 2002;324:1448-51.
4. Carey TS, Boden SD. A critical guide to case series reports. Spine. 2003;28:1631-4.
5. Audigé L, Hanson B, Kopjar B. Issues in the planning and conduct of non-randomised studies. Injury. 2006;37:340-8.
6. Walach H, Falkenberg T, Fønnebø V, Lewith G, Jonas WB. Circular instead of hierarchical: methodological principles for the evaluation of complex interventions. BMC Med Res Methodol. 2006;6:29.
7. Hartz A, Marsh JL. Methodologic issues in observational studies. Clin Orthop Relat Res. 2003;413:33-42.
8. Lloyd-Williams F, Mair F, Shiels C, Hanratty B, Goldstein P, Beaton S, Capewell S, Lye M, Mcdonald R, Roberts C, Connelly D. Why are patients in clinical trials of heart failure not like those we see in everyday practice? J Clin Epidemiol. 2003;56:1157-62.
9. Dalziel K, Round A, Stein K, Garside R, Castelnuovo E, Payne L. Do the findings of case series studies vary significantly according to methodological characteristics? Health Technol Assess. 2005;9:iii-iv, 1-146.
10. Bhandari M, Tornetta P 3rd, Guyatt GH. Glossary of evidence-based orthopaedic terminology. Clin Orthop Relat Res. 2003;413:158-63.
11. von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP; STROBE Initiative. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet. 2007;370:1453-7.
12. Ware JE Jr, Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care. 1992;30:473-83.
13. Bellamy N. Pain assessment in osteoarthritis: experience with the WOMAC osteoarthritis index. Semin Arthritis Rheum. 1989;18(4 Suppl 2):14-7.
14. Whelan DB, Bhandari M, McKee MD, Guyatt GH, Kreder HJ, Stephen D, Schemitsch EH. Interobserver and intraobserver variation in the assessment of the healing of tibial fractures after intramedullary fixation. J Bone Joint Surg Br. 2002;84:15-8.
15. Griffin D, Audige L. Common statistical methods in orthopaedic clinical studies. Clin Orthop Relat Res. 2003;413:70-9.
16. Harwood PJ, Giannoudis PV, Probst C, Van Griensven M, Krettek C, Pape HC; The Polytrauma Study Group of the German Trauma Society. Which AIS based scoring system is the best predictor of outcome in orthopaedic blunt trauma patients? J Trauma. 2006;60:334-40.
17. Park SR, Kang JS, Kim HS, Lee WH, Kim YH. Treatment of intertrochanteric fracture with the Gamma AP locking nail or by a compression hip screw—a randomised prospective trial. Int Orthop. 1998;22:157-60.
18. Kaufer H. Mechanics of the treatment of hip injuries. Clin Orthop Relat Res. 1980;146:53-61.
19. Bonamo JJ, Accettola AB. Treatment of intertrochanteric fractures with a sliding nail-plate. J Trauma. 1982;22:205-15.
20. Chevalley F, Gamba D. Gamma nailing of pertrochanteric and subtrochanteric fractures: clinical results of a series of 63 consecutive cases. J Orthop Trauma. 1997;11:412-5.
21. Jiang SD, Jiang LS, Zhao CQ, Dai LY. No advantages of Gamma nail over sliding hip screw in the management of peritrochanteric hip fractures: a meta-analysis of randomized controlled trials. Disabil Rehabil. 2008;30:493-7.
22. Chen CH, Chen WJ, Shih CH. Fixation of small tibial avulsion fracture of the posterior cruciate ligament using the double bundles pull-through suture method. J Trauma. 1999;46:1036-8.
23. Zhao J, He Y, Wang J. Arthroscopic treatment of acute tibial avulsion fracture of the posterior cruciate ligament with suture fixation technique through Y-shaped bone tunnels. Arthroscopy. 2006;22:172-81.
24. Nicandri GT, Klineberg EO, Wahl CJ, Mills WJ. Treatment of posterior cruciate ligament tibial avulsion fractures through a modified open posterior approach: operative technique and 12- to 48-month outcomes. J Orthop Trauma. 2008;22:317-24.
25. Morshed S, Corrales L, Genant H, Miclau T 3rd. Outcome assessment in clinical trials of fracture-healing. J Bone Joint Surg Am. 2008;90 Suppl 1:62-7.
26. Gartland JJ Jr, Werley CW. Evaluation of healed Colles' fractures. J Bone Joint Surg Am. 1951;33:895-907.
27. Skaggs DL, Sankar WN, Albrektson J, Vaishnav S, Choi PD, Kay RM. How safe is the operative treatment of Gartland type 2 supracondylar humerus fractures in children? J Pediatr Orthop. 2008;28:139-41.
28. Flynn JC, Matthews JG, Benoit RL. Blind pinning of displaced supracondylar fractures of the humerus in children. Sixteen years' experience with long-term follow-up. J Bone Joint Surg Am. 1974;56:263-72.