Letters to the Editor
DAG Program:: Identifying Minimal Sufficient Adjustment Sets
Knüppel, Sven; Stang, Andreas
Department of Epidemiology; German Institute of Human Nutrition Potsdam-Rehbruecke; Nuthetal, Germany; email@example.com (Knüppel)
Institute of Clinical Epidemiology; Martin-Luther-University of Halle-Wittenberg; Halle (Saale), Germany (Stang)
To the Editor:
An important source of bias in observational studies is confounding. According to the classic definition of confounding, a factor is a confounder if it is a risk factor for the disease and influences the occurrence of exposure. In addition, the factor must not be affected by the exposure of interest (no intermediate variable).1,2 The introduction of causal diagrams (directed acyclic graphs, or DAGs) into the epidemiologic literature, has established a new approach to conceptualize confounding and new rules to identify minimal sufficient adjustment sets have been established.3–6
Complex causal diagrams may contain hundreds of backdoor paths so that the identification of minimal sufficient adjustment sets may become complicated. To identify such sets, an algorithm of strict rules must be followed. As the DAG rules are logical rules, a software program is possible to identify the minimal sufficient adjustment sets. We developed a MS DOS command-line analysis tool, designed to select minimal sufficient adjustment sets within directed acyclic graphs. The main approach for identifying closed loops in the graph and finding all backdoor paths is the application of the so-called backtracking algorithm.
Backtracking is an algorithm to solve combinatorial problems.7 The algorithm searches systematical for a solution to a problem among all alternatives. It works with the trial-and-error principle. The backtracking algorithm incrementally adds a new candidate to the interim solutions until a final solution is found. When an interim solution does not lead to the final solution then the last step will be removed and an alternative path will be sought. Either all solutions will be found or it will be shown that no solution is possible. The running-time of the algorithm depends on the number of backdoor paths.
The classic example of backtracking is the 8 queens' puzzle, in which 8 chess queens must be positioned on a chess board in such a way that none of them is able to capture any other. The well-known Sudoku puzzle is another example of a puzzle that can be solved by backtracking.
In the example of DAGs, the investigator uses the best available a priori knowledge to set up the most plausible causal diagram. The DAG program then follows strict DAG rules to identify the minimal sufficient adjustment to the given DAG. First, all covariates affected directly by the exposure are detected. Thereafter, closed loops are detected in the graph. If a closed loop is found, the program will stop (such a graph violates a necessary assumption of causal diagrams). If the graph is acyclic, the backtracking algorithm identifies all backdoor paths and then it identifies blocked and unblocked backdoor paths. Potentially sufficient adjustment sets are derived in a way that all backdoor paths are blocked. The sufficient adjustment sets with the lowest number of covariates are called minimally sufficient adjustment sets. If the minimally sufficient adjustment set has one or more colliders, the program will identify these colliders and suggest additional adjustment variables to account for collider-adjustment-induced bias.8 Potentially sufficient adjustment sets are listed again and the minimal sufficient adjustment sets are suggested by the program.
The DAG program is written in C/C++ as a MS DOS program. A copy of the program and further information may be downloaded from http://epi.dlife.de/dag.
Department of Epidemiology
German Institute of Human Nutrition Potsdam-Rehbruecke
Institute of Clinical Epidemiology
Martin-Luther-University of Halle-Wittenberg
Halle (Saale), Germany
1. Greenland S, Robins JM. Identifiability, exchangeability, and epidemiological confounding. Int J Epidemiol. 1986;15:413–419.
2. Rothman KJ, Greenland S, Lash TL. Validity in epidemiologic studies. In: Rothman KJ, Greenland S, Lash TL, eds. Modern Epidemiology. 3rd ed. Philadelphia: Lippincott Williams and Wilkins; 2008:128–147.
3. Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology. 1999;10:37–48.
4. Greenland S, Brumback B. An overview of relations among causal modelling methods. Int J Epidemiol. 2002;31:1030–1037.
5. Hernán MA, Hernández-Díaz S, Werler MM, Mitchell AA. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol. 2002;155:176–184.
6. Glymour MM, Greenland S. Causal diagrams. In: Rothman KJ, Greenland S, Lash TL, eds. Modern Epidemiology. 3rd ed. Philadelphia: Lippincott Williams and Wilkins; 2008:183–209.
7. Skiena SS. Combinatorial search and heuristic methods. In: Skiena SS. The Algorithm Design Manual. 2nd ed. New York: Springer; 2008.
8. Pearl J. Causality: Model, Reasoning and Inference. New York: Cambridge University Press; 2000.
© 2010 Lippincott Williams & Wilkins, Inc.