Surgical and procedural technologies are evolving at a rapid pace. Dissemination of new techniques often requires training in new skills. Performance changes with experience over time, with rapid improvement seen during early learning, during which time poorer outcomes may be seen. For example, during the introduction of laparoscopic cholecystectomies in Western Australia, the prevalence of complications doubled.^{1} The “learning curve” is often used to describe this phenomenon and can be described as having 3 components: the starting point, the slope of the curve, and the plateau of the curve (Fig. 1).

*Starting Point:* Each person has individual experiences and backgrounds that combine and lead to an initial level of expertise in terms of performing a new procedure.

*Slope:* How fast a person learns a new task, or the slope of the curve. This slope varies by procedure, by person, and by how many and how often a person has performed a procedure.

*Plateau:* When the incremental change in the outcome being measured is not significant. This is usually the point at which a person is deemed experienced in that particular task.

Advances in surgery are constantly being made and surgeons are being trained on new procedures and devices from laparoscopic to robotic to single-incision surgeries. With this advancement, a body of literature is emerging on the associated learning curves. The existence of a learning curve clearly has implications for training and adoption of new procedures and devices. In addition, learning curves impact the evaluation of, and have been noted as a stumbling block to, rigorous evaluation of new procedures.^{2},^{3}

A standardized method to analyze learning curves would be helpful in many aspects of surgical research and training including the following:

* Comparing the introduction of devices or changes in procedures to established ones

* Being able to correct for learning curves in randomized control trials (RCT)

* Comparing learning curves of institutions and surgeons to identify areas of improvement

* Defining plateau levels to be reached for residency requirements or privileges at hospitals

* Assessing the use of simulation-based training to decrease or shorten the learning curve

A systematic review of the health technology literature was completed over 10 years ago with a challenge to the health technology community to improve statistical analysis of learning curves.^{2},^{4} The number of articles that purport to address the learning curve has continued to increase, perhaps along with an increasing recognition of the importance of the learning curve in regard to training and the evaluation of new procedures and devices. Our aim is to update the previous review in the field and assess whether there have been any changes in approaches to analysis. We specifically aimed to identify the methods used to define, portray, and statistically analyze the learning curve in minimally invasive abdominal surgery and make recommendations for clinicians, credentialing bodies, and researchers.

#### METHODS

This study was planned, conducted, and reported in adherence to PRISMA standards.^{5}

##### Study Questions

The objectives of this review were to determine how learning curves are assessed in the surgical literature and define an ideal framework for this analysis with regard to the following:

* Which study designs are used to assess the learning curve?

* Which outcomes are used to assess the learning curve?

* How learning curves are portrayed and graphed?

* How learning curves are statistically analyzed?

##### Information Sources

A broad systematic search was performed of clinical databases (MEDLINE, EMBASE, ISI Web of Science, ERIC, and the Cochrane Library) from 1985 to August 21, 2012, to identify papers in minimally invasive surgery that mention a learning curve. In addition, a hand search of all abstracts from 1985 to 2012 in Surgical Endoscopy was performed as well as a review of all preprint articles in the American Journal of Surgery, British Journal of Surgery, Annals of Surgery, Archives of Surgery, World Journal of Surgery, Journal of the American College of Surgeons, Surgical Endoscopy, and Surgery. During the data collection process, the bibliography of each article was reviewed for additional articles.

##### Search Strategy

The search terms used were “laparoscopic surgery” or “simulation” or “Surgical Procedures, Minimally Invasive” and “learning curve” for Medline and “Laparoscopic surgery” or simulation or “minimally invasive surgery” and “learning curve” for ISI Web of Science, EMBASE, Cochrane, and ERIC.

##### Eligibility Criteria

The focus of this review is minimally invasive surgery, because this has been an area where new procedures have emerged and disseminated rapidly in the past 2 decades. For a more similar group of procedures, the articles were limited to minimally invasive abdominal surgery, that is, gastrointestinal, gynecological, or urological procedures, as well as simulated versions of those procedures. The article had to address the learning curve and formally analyze it by a graph, table, or statistical technique. Only English language articles were included. Reviews, letters, and comments were excluded.

##### Study Selection

Study selection was accomplished through screening the abstract of all articles collected using the search terms mentioned earlier. The full texts of the articles were then obtained and assessed to make sure that they met the criteria mentioned earlier. A second reviewer assessed 10% of these abstracts to ensure that the study selection process was robust. Any disagreements were discussed and resolved by consensus.

##### Data Items Collected

All variables were collected for all papers, these included which outcomes were used to assess the learning curve, how the learning curves were portrayed and graphed, statistically analyzed, and what parts of the curve were addressed: the intercept, slope, or plateau. A full list of variables can be found in Supplemental Table 1, available at http://links.lww.com/SLA/A552.

We reviewed robotic-assisted laparoscopic radical prostatectomy (RALP) in further detail; in addition to the standardized analysis, we extracted the specific data for the learning curve that the articles reported as well as the information about the specific technique, patient baseline data (age, body mass index, American Society of Anesthesiologists score, etc), and surgeon data.

##### Data Collection Process

A data-extraction checklist was generated and the relevant data abstracted from the full text of the selected articles. The data extraction was also completed in duplicate for 10% of the cases. The only discrepancies between the 2 reviewers that arose were in the classification of statistical techniques. These were resolved through discussion with a statistician.

##### Synthesis of Results

Data collection and analysis was performed in Microsoft Excel (Microsoft Corp., Redmond, WA), SPSS (IBM Corp., Armonk, NY), and Stata software (StataCorp LP, College Station, TX). To assess a trend over time, data were merged into 5-year groups; complexity of analysis was ordered by the most complex approach used from none, descriptive, split groups [eg, analysis of variance (ANOVA)], univariate trend (eg, χ^{2} test for trend), multivariate trend (eg, regression) to trend analysis which adjusted for clustering (multilevel regression). Pearson χ^{2} test and the χ^{2} test for trend was used to compare different studies and different time frames of studies. Statistical significance was accepted at the 2-sided 5% level.

##### Risk of Bias

Publication bias does most likely exist since articles that do not show a statistical difference in the learning curve may be less likely to be submitted for publication than those that do show a difference. Also, in the analysis of learning curves, more variables than published may have been collected but not included in the final analysis because they did not show a clear change in slope or outcome. Bias in the collection process was checked through duplicate data collection of 10% of the data.

Each article was graded using a system consisting of 6 components, each category receiving 1 point:

1. Adequate detail of study population (age, sex, and clinical presentation)

2. Adequate detail of technique

3. Adequate detail of assessment

4. Study powered according to sample size calculation

5. No prior knowledge of outcome before inclusion of patient (prospective)

6. Blinding of assessment.

Adequacy was determined by if the reviewer thought he/she would be able to reproduce the study from the information included in the methods section of the article.

#### RESULTS

##### Study Selection

We identified 6758 articles that were potentially relevant; of these, 6585 were identified through the search strategy and 173 were identified through hand searching selected journals, searching preprint articles, and reviewing references of the aforementioned articles. From these, we identified 592 articles that were relevant to abdominal laparoscopic, robotic, or simulated surgery and that formally addressed the learning curve of these procedures as seen in Figure 2. The full reference list can be found in Supplemental Table 2, available at http://links.lww.com/SLA/A553.

##### Study Design and Demographics

Table 1 summarizes the key characteristics of the included studies. Using the grading system, the articles on average received 3.4/6 points (range: 0–6). Most studies had adequate detail of population, technique, and design whereas very few (9, 2%) included a power calculation or used blinded assessment (49, 8%). The size of the studies varied from 6 to 43,028 cases with an average size of 677 cases and median of 115 cases (interquartile range: 54–300).

The number of articles addressing learning curves is steadily increasing every year as seen in Figure 3. The number of studies assessing simulation and robotics specifically is also increasing.

##### Learning Curve Outcomes

The studies measured on average 4.1 outcome variables with a range of 1 to 19. Procedure time is the most commonly used proxy for the learning curve (508, 86%). We grouped the other outcomes based on the type and timing of the outcome, that is, intraoperative outcomes, postoperative outcomes, intraoperative technical skills, and patient-oriented outcomes. Intraoperative outcomes were assessed 622 times in 333 (56%) articles. Postoperative outcomes were assessed in 322 (54%) articles for a total of 753 times. Most studies differentiated between intraoperative and postoperative complications, but 29 studies used both outcomes without differentiation and 8 provided no details about which type of complications were measured. Intraoperative technical skills were measured in 102 (17%) of the articles, 171 times. Finally, the least used outcome was patient oriented outcomes (49, 8%), which included amount of pain medication, time to oral intake, or quality of life. A detailed list of the outcomes can be found in Table 2.

##### Graphically Displaying the Learning Curve

Most articles (425, 72%) graphed the learning curve, with the most common method being a scatterplot (298, 50%); 107 (19%) of the scatterplots had a curve fit line superimposed, of which 64 (15%) had a regression line and 27 (6%) used a smoother such as moving average.

Many papers use straight lines to approximate the learning curve (12%), but authors also chose different curves to approximate the learning curve (11%). Not only are logarithmic and exponential curves used but also curves that allow for multiple inflection points [eg, joinpoint or spline regression) and those that allow for smoothing of the data (eg, locally weighted scatterplot smoothing (LO[W]ESS)), which uses multiple regression models]. The joinpoint regression model allows for different regression lines between points. The number of joinpoints required can be evaluated by considering the fit of the resultant curve. LOESS fit is similar in that the regression line allows for multiple inflections or changes in slope, but in this case it is done through local weighting of data and the generation of a separate linear regression line for each data point, which collectively produce a smoothed curve. However, it does not readily allow a statistical assessment of fit of the resultant curve.

Of the articles that used specific parts of the curve to define the learning curve (135, 23%), most articles used the plateau (85, 63%) and some described the slope or learning rate (34, 25%).

##### Learning Curve Statistical Analysis

Many papers used a formal statistical analysis of the learning curve (435, 73%), the most common being univariate split groups (413, 70%) such as student *t* test, χ^{2} or simple ANOVA. As an example, a study of 100 laparoscopic cholecystectomies would be split into 4 groups of 25 cases and the mean duration of the operation was compared between the groups.^{6}

Univariate trend analysis (194, 33%), such as Pearson correlation, repeated-measures ANOVA, or Friedman test, was also quite commonly used. A regression model was used in 129 (22%) of analyses with linear regression (58, 10%) and logistic regression (37, 6%) being the most common. Least squares regression was used in 14 articles, nonlinear in 12, spline curves in 6, and polynomial in 5.

More authors are reflecting the inherent differences between surgeons in their statistical analysis and are adjusting for it. Twenty-one (4%) articles adjusted for clustering (typically at the surgeon level), mostly using generalized linear mixed models (13), but also generalized estimating equations (3), and hierarchical models (2). These models seek to reflect the expectation that surgeries performed by the same surgeon will be more similar to each other than surgeries by another surgeon. As well as having different underlying levels of performance, different surgeons will likely learn at differing rates. These factors lead to the grouping or clustering of data by surgeon to achieve accurate representation and analysis of the data. Risk adjustment to account for patient characteristics was calculated in only 5 studies.

Cumulative Sum (CUSUM) analysis was utilized in 25 (4%) of the articles. CUSUM analysis is used to analyze a dichotomous variable, where the author sets an acceptable failure rate and an unacceptable failure rate that seems appropriate for this variable. CUSUM analysis can be used as a way to monitor change, with the slope of the CUSUM curve at each procedure point either increasing for a failure or decreasing with a success. It has traditionally been used in a steady state process, where the acceptable complication rates are known and a person can monitor if the complication rate changes significantly. A learning curve using CUSUM would initially have a series of positive slopes, indicating that the failure rate of the surgery exceeds the acceptable values. The slope would then start to trend down, indicating that the surgeon is learning and the failure rate is now becoming acceptable.

##### Comparing Laparoscopic, Robotic, and Simulation Studies

Of the articles, 398 were studies on laparoscopic surgery, 94 on robotic, 76 on simulation (virtual reality simulation and desktop trainers), and 24 on animal models. Because of the inherent differences in technique, we compared the study size, type, and type of operator as well as the type of learning curve outcome that was collected. The results can be seen in Table 3. The types of outcomes collected were categorized into time and 4 groups to aid in the statistical analysis as discussed earlier.

The use of different study designs varied by study type; simulation and animal model studies used RCT studies more often than laparoscopic and robotic studies (*P* < 0.001) and were more often prospective (*P* < 0.001). Animal studies tended to have the smallest number of cases, with laparoscopic and robotic studies generally similar and simulation tending to be larger. Simulation and animal studies had students and trainee operators (with or without attending surgeons) more than the other 2 types of studies. All groups collected a significant amount of data during the procedure. The simulation studies collected predominantly technical skills data, whereas studies in the operative room collected both technical skills and outcomes data. Postoperative data and patient-specific data were only collected in laparoscopic and robotic studies.

##### Detailed Review of RALP

Of the 37 robot-assisted laparoscopic prostatectomy articles that were found in the original review, 4 were excluded because they did not address a continuous learning curve, that is, compared experienced with inexperienced operators, or the effect of prostate size on the learning curve. The mean size of the 33 studies was 297 procedures, median of 147 procedures with a range of 24 to 3744 procedures.^{7–39} All of the articles mentioned what type of equipment they used in the da Vinci robot. Eighteen studies used Vattikuti Institute prostatectomy technique^{40} with some adjustments. The mean/median ages of the patients ranged from 56 to 67 years, body mass index ranging from 18 to 36 kg/m^{2}, preoperative prostate-specific antigen from 5.3 to 22.3 ng/mL. Most articles only performed surgery on clinically localized disease.

As compared with the other procedures, RALP articles looked at more variables and different ways of measuring the learning curve. Of the 33 articles, 29 assessed the learning curve using operation duration, 19 looked at estimated blood loss (EBL), 17 at positive surgical margins (PSM), 12 at length of hospital stay, and 8 at continence and complications each. Of the articles that looked at time, 26 found a statistically significant decrease in the time whereas 3 found a decreasing trend. There were 3 main ways that articles defined the cutoff for the learning curve. Many used a strict 3- or 4-hour operative time cutoff for proficiency. Others used a review of the literature and picked a common cut off like 25 or 30 procedures. Others like Sammon et al^{29} graphed the data and found the plateau and then split the data according to this plateau. The articles that defined the learning curve using time found a range from 12 to 750 procedures, with a median of 30 procedures. The postlearning curve operative time had a mean (SD) of 207 (46) minutes, ranging from 134 to 305. Some articles found a 2-stepped learning curve, for example, Jaffe et al^{17} found an initial plateau at 12 and then another at 189 procedures. Most articles defined operating time from skin to skin (12), others defined it as console time and some did not specify.

Of the articles that looked at EBL, 12 found a statistical decrease, 3 found no difference, and 23 found a trend in reduction. Of the 17 articles that looked at PSM, 12 found a statistically significant difference and 2 a decreasing trend whereas 3 found no difference. The articles that used PSM to define the learning curve found it to range from 30 to 1250 procedures, varying also by type of and stage of the cancer. Most of the articles defined the learning curve between 100 and 300 cases for PSM. Although only 2 of the 8 articles that assessed complications along the learning curve found that it decreased, 1 article defined the plateau around 150 procedures.^{14} Continence rates at various follow-up times were also used as a way to measure the learning curve and 6 of the 8 articles found that it increased as surgeons gained more experience. The mean grading score for these 33 articles was 3.2, with a range of 0 to 4.

To give an overview of the learning curve data, a composite graph of the operative time for learning curves data were graphed (Fig. 4).

##### Study Quality

Most studies described the detail of the study population, technique, and learning curve assessment adequately (93%, 92%, 88%, respectively). Fifty-seven percent had no prior knowledge of outcome before inclusion of the patients. Very few studies were powered according to sample size calculation (2%). Blinding of the assessment was mentioned in only 8% of the studies.

##### Changes in Outcomes Over Time

More advanced and varied techniques have been used over the last 10 years to analyze learning curves. There was no statistical evidence of an increase in complexity of analysis despite a numerical increase in the most complex analysis approach (regression methods which adjust for clustering) over that period. There was statistical evidence of an increase in laparoscopic and robotic studies from 1997 to 2012 (*P* < 0.001 both) though not simulation (*P* = 0.051) and animal (*P* = 0.481) studies. There was no evidence of a proportional change in the number of RCT (*P* = 0.335).

#### DISCUSSION

Learning new procedures are a constant part of surgical innovation and improvement and also training of new surgeons. With each new procedure comes a learning phase and this is increasingly being recognized in the literature as seen by the increase in number of articles over time. The outcomes used and analysis techniques are varied. The increase in articles using regression analysis and trend analysis shows increasing recognition of the complexity of the learning curve and the need to use methods that address this. Many trends that are seen in the data reflect the trends of the surgical field, including an increase in the number of articles discussing robotic surgery.

The outcomes that were analyzed often seemed to reflect the accessibility of data collection, and may not necessarily be those of greatest clinical importance. For example, many simulators collect error rate, instrument movement and path length automatically whereas these outcomes are hardly ever seen in clinical cases. Similarly, complications, blood loss, and duration of the procedure are often recorded during surgeries, and this data would be readily available for retrospective or prospective reviews. Even though these are the most commonly used outcomes, some studies do choose to look at postoperative patient outcomes including readmission, reoperation, or pain, which is important when assessing the effect of the learning curve on the system and patient as a whole.

CUSUM analysis continues to be used in learning curve analysis but the limitation remains that the investigator has to determine the acceptable failure rate for the outcome that is being studied. At this moment, there is typically insufficient data to confidently set failure rates. Once learning curves for procedures have been determined, and an agreed upon failure rate has been established, CUSUM analysis can be used to assess individual practitioners' learning. The CUSUM analysis is indeed utilized in real-time in manufacturing settings, and it could be used in the same manner for surgical performance. There would be the possibility to prompt review of the surgeon's operative outcomes, based on a predefined failure rate. In concert, the CUSUM analysis could also identify a surgeon with an accelerated learning curve, which would identify the surgeon for independent practice at an earlier stage of the predefined learning pathway.

##### Ideal Assessment of the Learning Curve

This review analyzed 592 articles, though only 9 (1.5%) of studies were a priori powered to calculate a statistical difference in the learning curve for specific outcomes related to the procedure of interest. There is the possibility of bias with regard to the outcomes measured and the techniques utilized for analysis of the learning curve, and the findings reported by authors. It thus makes it difficult to conclusively delineate a set of guidelines for learning curve analysis. Instead, we would like to make recommendations, not define a standard, as to how to ideally assess a learning curve, knowing that the resources available and sample size vary significantly between procedure and institution. To be able to realize the full potential of a learning curve, to use it when comparing the introduction of devices or changes in procedures, correcting for learning curves in RCTs, identifying areas of improvement for surgeons or institutions, defining levels for residency or privileges at hospitals, and assessing the use of simulation-based training, a standard approach to the analysis of the learning curve needs to be adopted. There is a great deal of complexity in defining values for an ideal learning curve as this article has attempted to elucidate. There are multiple factors that lead to a successful procedure, which include the skills and experience of the surgeon, the operative team and also the varying complexity of individual cases. An ideal analysis would include these aspects in a multivariate analysis.^{41} There is no single outcome that best represents the success of a procedure hence multiple outcomes are preferable and it would be reasonable to expect the impact of learning to vary between outcomes. Possible outcomes include operative time, and intraoperative and postoperative outcomes such as complications, transfusions, recurrence, and even though more time-intensive, technical skills score and patient quality of life. In this study, time was the most frequently used outcome measure to define the learning curve. Although simple to measure, this metric undermines the complexity of the operative procedure with regard to learning curve analyses. External factors such as patient and disease variability, operative team familiarity, and institutional dynamics can affect the time taken to perform an operative procedure. Furthermore, time may not be the measure of prime importance with regard to maintenance of quality and safety of the surgical intervention. Many outcomes confound the usefulness of using procedure time as an outcome variable, and if used to assess learning curves, other measures of performance and patient complexity must also be assessed. The outcomes need to be clearly defined to be meaningful.^{42} Because of the inherent differences between surgeons and institutions, more than 1 surgeon and more than 1 institution should be assessed and the differences accounted for in the analysis by generalized estimating equations or linear mixed models. Sequential cases of the same surgeon are preferred to cases of surgeons at different levels for these same reasons. A graphical display of the data should be provided whenever possible. In summary, an ideal learning curve assessment would include the following:

* Multivariate analysis: accounting for different surgeons, teams, and patients

* Multiple clearly defined outcome variables

* Multiple institutions and surgeons

* Graphical display of data

It is also important to note that learning in the clinical context does not occur according to a single trajectory, or line. Training is now becoming more refined, involving a stepwise process of knowledge, simulation-based practice, achievement of competence levels, and guided progression through the complexity of the operative procedure. There are statistical techniques that take this into account, such as joinpoint or spline regression, 9 (1.5%) of the articles used these statistical analyses.

##### Limitations

This systematic review of the literature is limited by the reporting and methodology of the published work to date. Most articles did not report a power analysis for assessment of the learning curve. It is thus possible that the outcomes measured and techniques used were skewed to methods that were found to be statistically significant. The search criteria, even though broad, could have missed articles. A total of 40 articles were found through hand searching journals and reviewing references that had been missed in the initial search that found 6585 articles. The articles were mostly reviewed and data extracted by 1 reviewer, which could have led to reviewing inconsistencies. A second reviewer did review 10% of the articles to ensure that the data collection followed the determined protocol. The level of reported data and procedures made it difficult to summarize across studies and limited the analysis.

##### Strengths

This review is a comprehensive review of the surgical learning curve literature to date, reviewing 592 articles, which is larger than any previous review.^{2} We discuss the state of learning curve analysis with recommendations that can be used by authors who wish to assess the learning curve.

##### Implications

The learning curve is expensive and can lead to morbidity and mortality. At a population level, there is concern that it could lead to decreased quality of care and increased costs of care. This leads to a decreased value of health care during the learning period (value being quality divided by costs).^{43} By understanding the learning curve better, we would be able to start recouping the value of innovation and learning, an area yet to be investigated.

#### CONCLUSIONS

Learning curves are an important aspect of complex procedures. Assessment is essential to inform surgical training and evaluation of new procedures in the clinical setting. Learning curves are being analyzed in a variety of ways. By reviewing the existing learning curve literature, we have developed basic guidelines for choosing the outcome to be analyzed, statistical analysis, and graphical display of the learning curve. There is still a need for further improvement in the methodology and data used to provide more informative findings though some examples of large and informative studies have been conducted.^{44},^{45}

##### ACKNOWLEDGMENTS

The author contributions were as follows: Study concept and design: Harrysson, Aggarwal, Cook, Feldman. Analysis and interpretation of data: Harrysson, Aggarwal, Cook, Sirimanna. Drafting of Manuscript: Harrysson, Aggarwal, Cook. Critical revision of the manuscript: Harrysson, Aggarwal, Cook, Feldman, Darzi. Statistical analysis: Cook. Iliana Harrysson has full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.