“Data visualization” may be defined as the display of measured quantities (i.e., data) by means of a variety of graphics, including points, lines, numbers, symbols, words, shading, colors, and pictures, among others. It wasn’t until the late 1700s that statistical graphics were invented. Those graphics were built mostly around the use of shapes (length and area), time series, and scatter plots. Data visualization is a relatively recent development that combines science and arithmetic with art and design. Today, in our contemporary world, data visualization is all around us — in some cases, data are portrayed graphically in almost real time. Consider, for example, access to crime-related information and data in a specific city (e.g., San Francisco) and across the country. These data have been displayed in the form of crime maps and are made available to the public via the use of a Web site. The site was created to allow people to gain access easily to the data for areas of their particular interest; in effect, they create their own maps of incidents as they occurred in their own neighborhoods (1).
However, it is easy to distort data when presenting it in a graphical manner. Much like good writing, good graphical displays of data can communicate complex ideas with clarity, precision, and efficiency. Similarly, much like poor writing, bad graphical displays may distort or obscure the data, which will make data more difficult to understand, interpret, or compare and contrast; in essence, it may distort the real message and communicate inaccurate information. In the most severe cases, it actually may portray a bold lie. Regardless, the intent of data visualization is to present complex quantitative information in a more accessible format that allows interpretation, reasoning, and in-depth considerations of its meaning. Hence, it is very important that whatever methods are used to depict data graphically, these methods ought to be of high quality, maintain integrity and sophistication, and do not distort the inherent meaning of the data presented.
Consider Florence Nightingale who became a pioneer in the use of visualization of data to present reports on the nature and magnitude of the conditions of medical care in the Crimean War. Her audiences were the military leaders, civil servants, and Members of Parliament who otherwise would have had difficulty reading or understanding more traditional statistical reports. Nightingale had a strong mathematical mind and decided to use the “pie chart,” first developed by William Playfair in 1801, to present data graphically. Her diagrams, now known as polar area diagrams, illustrate seasonal sources of patient mortality in military field hospitals. She realized that just looking at numbers is unlikely to impress most people, including ministers and military leaders. Translating numbers into a picture makes them more dramatic and turns them into messages that are much harder to ignore. If a picture is worth a thousand words, then a graphic is worth 1,000 numbers. Figure 1 presents the “Nightingale Rose” diagram in its recalibrated format (data presented so it’s proportional to the area) (4).
Another example of data visualization that most worksite health promotion practitioners are familiar with involves the much cited obesity maps from the U.S. Centers for Disease Control and Prevention (CDC). CDC scientists have mapped the prevalence of obesity by state across the United States for more than two decades. The sequential display of these data by year provides a compelling overview of the rise in obesity prevalence across the country. Figure 2 shows the data for the years 1990, 2000, and 2010 in a single panel for summary purposes, but interested readers may find the entire slide set on the CDC Web site (5).
APPLICATION TO WORKSITE HEALTH DATA
Considering the above lessons from Florence Nightingale and the CDC, it seems that a similar approach to bringing additional attention to underdeveloped or underappreciated areas of interest for worksite health may make sense. Health and productivity management may well be one of those areas.
Health and productivity management has emerged as a strategy to support corporate solutions in the field of health improvement and organizational performance. Initially, the health and productivity management field expanded rapidly and received a lot of attention, especially as health-related absenteeism and presenteeism were associated with significant financial impacts to organizations (3,6). However, the measurement of productivity has proven challenging, especially in the area of presenteeism. Presenteeism, being at work when you should be at home because you’re either ill or too tired to be effective at work, has turned out to be somewhat difficult to measure objectively and report convincingly. Although many measurement tools have emerged that measure absenteeism, presenteeism, and total health-related productivity loss based on self-report and with acceptable psychometric properties (7), graphical displays that show the proportional relationships between productivity loss and health factors have been lacking. New, innovative ideas on how to do this may prove useful in communicating the potential and the opportunities of worker health improvement to company leaders.
QUANTIFYING HEALTH AND PRODUCTIVITY
With the help of one of the most widely used self-report productivity measurement tools available, the Work Productivity and Activity Impairment (WPAI) Scale, we measured absenteeism and presenteeism loss related to health factors. These data were obtained using a health assessment survey in which questions on employee health behaviors and the WPAI were included. In 2010, a total of 32,267 employees who worked in companies that implemented a comprehensive worksite health promotion program and who also were members of the HealthPartners health plan were invited to complete the health assessment. A total of 27,217 employees completed the survey (66% completion rate), and after the removal of respondents who were not enrolled continuously in the health plan, did not have a pharmacy benefit, or who had multiple surveys in the current year, a final sample of 21,410 employees were retained for this project. The productivity loss data were monetized using an average annual salary of $60,002 based on the actual national employment cost index and expressed as excess health-related productivity loss per employee per year in 2010 dollars (2).
In calculating excess health-related productivity loss, we first established a subgroup of “healthiest” employees who fit a profile defined by a) having no chronic conditions; b) a healthy body mass index; c) not using tobacco; d) low risk for stress, depression, and alcohol; e) not sedentary; f) not sleeping less than 6 hours per night; and g) not self-reporting pregnancy. These employees were regarded as being “healthiest” and assumed to have no excess health-related productivity loss. The remainder of the population was then considered to have excess health-related productivity loss that can be estimated by a health factor. Because of this methodology, the most optimal health category by a health factor may still have productivity loss because the people assessed were not part of the healthiest group.
The numbers of interest reflect the prevalence of health factors by category and the dollar amount of productivity loss associated with the behaviors. These numbers have both magnitude and order. Therefore, to display such numbers in a graphical format, we have to be sure to present them with exact ratios of magnitude and order to the areas that are portrayed in the graphics. Not doing so will result in distorted ratios that will portray inaccurate interpretations of the data. Furthermore, the relationships between the numbers need to be portrayed in a manner consistent with that relationship, for example, a cross-sectional measure displayed across multiple categories of a health factor should not be presented in a line graph because each category is a unique group of people. Rather, a scatter plot or a bar graph is more appropriate.
To illustrate data visualization in the area of health and productivity, we selected several health factors and associated the categories within each factor, with the productivity loss measured by the WPAI expressed in dollars per employee per year (PEPY) from our data sample. The health factors selected include body mass index, physical activity, sleep, and the optimal lifestyle metric (OLM). The OLM reflects the degree of adherence to four healthy lifestyle behaviors (i.e., being physically active, not smoking, eating five fruits and vegetables daily, and no misuse of alcohol). In a previous column, the association of OLM with important health outcomes has been described (8). The Table presents the actual numbers used for the purpose of comparison with the graphic display of the (same) data in the panels presented in Figure 3.
THE GRAPHICAL DISPLAY OF HEALTH AND PRODUCTIVITY DATA
Figure 3 presents four panels in which the relationship between a specific health factor, its categories, prevalence of the health factor in the population studied by category, and amount of productivity lost per year (in dollars) is displayed graphically — an attempt at data visualization for health and productivity management. As you will note, compared with the Table, the graphics portray a lot of information in an easily accessible format that may be interpreted at multiple levels. For example, in each panel, the healthiest category has the lowest excess health-related productivity loss. At the same time, the display shows how many people are driving this excess productivity loss based on the size of the three-dimensional data point (sphere) that conveys the relative number of people represented in the data compared with the other data points (spheres), that is, prevalence. Furthermore, it puts all this information in context across categories of the health factor considered and the amount of financial loss associated with those categories.
PRINCIPLES TO ENSURE GRAPHICAL INTEGRITY
Whereas graphics may be used to present data in a more “accessible” format to a given audience, it is important to make sure that the data are represented in a truthful manner. That means that the display of numbers, which measure quantity, has to abide to a representation of both magnitude and order. Edward Tufte, one of the world’s leading analysts of graphic information, presents a list of six principles to be followed when it comes to graphical integrity (9):
1. The representation of numbers, as measured physically on the surface of the graphic itself, should be directly proportional to the numerical quantities represented.
2. Clear, detailed, and thorough labeling should be used to defeat graphical distortion and ambiguity.
3. Show data variation, not design variation.
4. In time series displays of money, deflated and standardized units of monetary measurement are nearly always better than nominal units.
5. The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data.
6. Graphics must not quote data out of context.
Despite the observed increase in interest in worksite health promotion programs among corporate executives in the United States in recent years, it may be difficult for worksite health promotion practitioners to present underlying issues and rationale that prompt and justify investment in programs. The use of statistical reports and numbers may speak well to executives, but, much like Florence Nightingale’s approach, if a single graphic is worth 1,000 numbers, then an honest, clear, and sophisticated means to display multiple messages based on quantitative information in an integrated manner may trump a numbers-only dashboard any day. The example presented here is just one consideration that needs to be improved on, expanded into longitudinal data, and connected to direct feedback from the intended audience. Regardless, when considering the use of visual data displays, be sure to reflect the nature of the data correctly, present the magnitude and the order appropriately, and add any needed context information to ensure that your audience receives an honest and compelling view.