3  M: Tools for Thinking About Causality

Load R-libraries

Code
library(dagitty) # R implementation of http://www.dagitty.net

Topics

  • Causality
    • Causal terms
    • Causal questions
    • Causal inference
    • Causal claims
      • If you mean it, say it!
  • Causal effect
    • Potential outcomes
    • Single-unit causal effect
    • Average causal effect
  • Directed Acyclical Graph (DAG)
    • Chain: \(X \rightarrow Z \rightarrow Y\) (Z is a mediator)
    • Fork: \(X \leftarrow Z \rightarrow Y\) (Z is a confounder)
    • Inverted fork: \(X \rightarrow Z \leftarrow Y\) (Z is a collider)
    • Total, indirect, and direct causal effect
  • Cause-probing research designs
    • Experimental designs: Manipulation, Assignment mechanism known
    • Quasi-experimental designs: Manipulation, Assignment mechanism unclear
    • Non-experimental designs: No Manipulation, Assignment mechanism unclear

Theoretical articles to read:

  • Hernán (2018) on how we speak about causality
  • Steiner et al. (2023) on frameworks for causal inference, the sections on DAGs and potential outcomes
  • Have a look at Dagitty homepage
  • Rohrer (2018) on causal inference in observational studies, main points made using DAGs


Here is a tentative definition of causality to get us started:

X causes Y if a change in X would lead to a change in Y for at least one unit of observation.

We may ask different types of causal questions, for example:

  1. What causes Y?
  2. Does X cause Y?
  3. How large is the causal effect of X on Y? (for a single unit, or on average)
  4. How does X cause Y? (mediation)
  5. Does the causal effect depend on other variables? (moderation = interaction)

Causal inference is the pursuit of answering such questions. We will primarily focus on the 3. Estimation of causal effect(s). In this context, causal inference involves estimating the extent to which an outcome variable (Y) would change in response to a given change in the exposure variable (X). Note that question 3 encompasses question 2 as a special case. If the estimated effect differs from zero, the answer to question 2 is “yes”; otherwise, it is “no.”


3.1 Speaking about causality

  • “Association does not imply causation.”
  • “Where there is association, there is causation.”
  • “No correlational smoke, without causal fire.”

To justify the claim that an observed association between X and Y is a good estimate of the causal effect of X on Y requires a lot more than just reporting the association. There are typically plenty of alternative non-causal explanations of an association to be dealt with before the causal claim will be considered justified. This is true in particular for observational studies with no experimental manipulation and lack of information on how individuals’ came to be exposed to the causal factors under study. For example, observational studies have found that occupational noise exposure is associated with increased risk for heart disease. But maybe part or all of this is explained by other factors that may differ between people working in noisy environments compared to quiet environments. Even if we have tried to adjust for differences in some of these, such as education and income, there may always be unmeasured factors that we have no control over.

Sometimes researches prefer to play it safe and avoid the term cause or causality in favor of terms like associated, linked or predicts. And sometimes they forget themselves and use causal terms anyhow, like influence, change, increase, amplify, reduce, decrease, affect, effect, moderate, and so on. It is very hard to write an article on a causal research question without using any causal terms.

Hernán (2018) argues well for why we should say it if we mean it. In doing so, he cites Kenneth Rothman, who said it 40 years ago:

Some scientists are reluctant to speak so blatantly about cause and effect, but in statements of hypothesis and in describing study objectives such boldness serves to keep the real goal firmly in focus and is therefore highly preferable to insipid statements about ‘association’ instead of ‘causation’.

Rothman, cited in Hernán (2018)


3.2 Causal effects and potential outcomes

“Causality” is a tricky concept; we all know what it is, but no one really can define it. The Potential Outcome Model of causality (PO) simply avoids defining causality, and instead defines “causal effect”:

  • Causal effect is defined as a contrast between potential outcomes. Typically, \(y_i^1 - y_i^0\) or \(\frac{y_i^1}{y_i^0}\), where \(y_i^1\) is the outcome of Treatment 1, and \(y_i^0\) is the outcome of Treatment 0 for study unit \(i\). Example: The causal effect of me drinking a cup of coffee on my perceived tiredness right now is how tired I would be if a drank the coffee right now compared to how tired I would be if I didn’t drink the coffee right now.
  • Fundamental Problem of Causal Inference (Holland, 1986): Only one of the two (\(y_i^1\), or \(y_i^0\)) can be observed, the other is counterfactual. This perspective reduces a hard problem (how to define “causality”) to the seemingly simpler problem of how to deal with missing data.
  • Solution to the fundamental problem: Find a substitute that is equivalent, ceteris paribus, except for the treatment condition. Either on single-unit level (e.g., retest the same unit, or test an “identical twin”) or group level (two groups that are balanced on all relevant variables, making the group assignment “ignorable”)
  • PO highlights the need to define what casual effect that is being estimated. We will mainly discuss two general types:
    • Single-unit causal effect, the causal effect in each individual unit
    • Average causal effect. The average single-unit causal effect in a sample or population.

Ceteris Paribus: “All other things being equal”

Warning: Terminology and notation in the potential outcome business vary a lot. I try to stick to the notation used by Gelman et al. (2021) (Ch. 18).


Sometimes it’s easy

It’s important to note that the fundamental problem of causality doesn’t mean that causal inference is always difficult. In fact, sometimes it’s easy! For example, consider a situation where we are uncertain about the causal effect of a specific light switch in a hotel room. We turn the switch on (treatment) and observe that the bathroom light turns on. To confirm, we turn the switch off (control) and observe that the bathroom light turns off. This straightforward process is usually enough to convince us, beyond reasonable doubt, that the light switch has a causal effect on the bathroom light. We might refer to this as the “light-switch design,” but the more established term is the “Single-N design,” which will be discussed in upcoming seminars.

Another approach involves using identical copies of an object, exposing one to the treatment and the other to the control, and then comparing the outcomes. For instance, you could cut an iron bar into two pieces, place one piece in ordinary water and the other in salty water, and then compare the amount of rust on each piece. The difference in rust can be used to estimate the causal effect of exposure to salty versus non-salty water. We might call this the “identical twin design.”

Holland (1986) referred to both of these strategies as “scientific” approaches to causal inference, in contrast to the “statistical” approach, which involves comparing groups of individuals who have received different treatments (as in a randomized experiment). A key drawback of the statistical approach is that it can only provide estimates of average causal effects across a population, rather than estimates of single-unit causal effects—unless one assumes that there is no variation in individual responses to the treatment.


3.3 Directed Acyclical Graphs (DAGs)

A Directed Acyclical Graphs (DAG) is a tool to help us think clearly about causal and non-causal relationships between variables of relevance to a specific causal research problem.

Code
vvg_dag <- dagitty( "dag {
   VVG -> AGG
   VVG -> PA -> AGG
   PA -> BMI
   VVG <- Support -> SProb -> AGG
   SProb <- Genetic -> AGG
}")

coordinates(vvg_dag) <- list(
  x = c(VVG = 1, Support = 1, BMI = 2, PA = 3, SProb = 3, Genetic = 4, AGG = 4),
  y = c(VVG = 5, Support = 1, BMI = 3, PA = 4, SProb = 2, Genetic = 1, AGG = 5))

plot(vvg_dag)

DAGS are graphs that consist of nodes, representing variables, and arrows representing causal effects.

  • Directed. Only single headed arrows allowed. Make sense because we typically assume that causality flows in one direction: Smoking \(\rightarrow\) Lung-cancer.
  • Acyclical. You can never follow the arrows and get back to where you started. Makes sense as we think of causal factors preceding outcomes in time.
  • Graph. It is a graph, both in the everyday meaning of a figure, and in the mathematical meaning that allows interesting information to be derived from it using smart algorithms.


The simple DAG above show my model of how these variables are causally related:

  • Exposure: Amount of Violent video gaming in childhood (VVG)
  • Outcome: Adult aggression (AGG)
  • Covariates: Physical activity (PA), Parental support (Support), School problems (SProb), Weight (BMI), Genetic factor (Genetic).

Longitudinal study: Exposure and covariates measured in childhood, outcome in adulthood.


Please draw the DAG above using the online tool at dagitty.net

Drawn using tool on dagitty.net

Jumping ahead: The DAG is a mathematical object (a graph) that we may ask questions about, e.g., using dagitty. Here I use the function dagitty::adjustmentSets() to find small sets of variables that I need to conditioning on to estimate the total causal effect of the exposure, VVG, on the outcome AGG.

Code
# Get minimal sufficient adjustment sets
adjustmentSets(vvg_dag, exposure = 'VVG', outcome = 'AGG', 
               type = "minimal", effect = "total")
{ Genetic, SProb }
{ Support }


Total, direct and indirect causal effect

In the DAG above, the total causal effect of VVG on AGG is the combined effect of the direct causal effect \(VVG \rightarrow AGG\) and the indirect causal effect \(VVG \rightarrow PA \rightarrow AGG\).

If we assume linear causal relationships with \(a\), \(b\), and \(c\) as regression coefficients, that is, change in outcome (on average) for one unit change of the causal factor:

  • \(a\), for \(VVG \rightarrow PA\) (part of indirect effect),
  • \(b\), for \(PA \rightarrow AGG\) (part of indirect effect),
  • \(c\), for \(VVG \rightarrow AGG\) (direct effect),

then he causal effects would be:

  • Direct causal effect: \(c\)
  • Indirect causal effect: \(a \times b\)
  • Total causal effect = direct effect + indirect effect = \(c + a \times b\)


Here is a simple simulation with three variables, \(X, M, Y\), and coefficients \(a, b, c\):

  • direct causal effect (\(X \rightarrow Y\)): \(c = -0.5\).

  • indirect causal effect (\(X \rightarrow M \rightarrow Y\)): \(a \times b = 0.5 \times 0.6 = 0.3\).

  • total causal effect: \(c + a \times b = -0.5 + 0.3 = -0.2\). That is, increasing \(X\) with one unit will decrease \(Y\) with 0.2 units.

Code
# Simulate data
set.seed(123)
n <- 1e5
x <- rnorm(n)  # Exposure variable
m <- rnorm(n) + 0.5 * x  # Mediator variable
y <- rnorm(n) + -0.5 * x  + 0.6 * m # Outcome variable

# Run a regression analysis, to find total causal effect of X on Y
fit0 <- lm(y ~ x)

# Adding m to the model, to find direct effect of X on y
fit1 <- lm(y ~ x + m)

# Show causal effect estimates : coefficients for x
round(c(total = coef(fit0)[2], direct = coef(fit1)[2]), 2)
 total.x direct.x 
    -0.2     -0.5 


Properties of DAGs

  1. Non-parametric (does not assume any specific type of relationships between variables)
    • Note: This implies that DAGs do not distinguish between additive effects and interactions (moderation) between variables. Thus, \(X \rightarrow Y \leftarrow Z\) is consistent with independent and additive effects of \(X\) and \(Z\) on \(Y\) as well as any type of interaction effect between \(X\) and \(Z\) on \(Y\).
    • DAGs are related to Path analysis and Structural Equation Modelling (SEM), but unlike DAGs, these statistical methods do assume linear relationships between variables
  2. Chains, forks and inverted forks are its main components
    • Chain: \(X \rightarrow Z \rightarrow Y\) (Z is a mediator)
    • Fork: \(X \leftarrow Z \rightarrow Y\) (Z is a confounder)
    • Inverted fork: \(X \rightarrow Z \leftarrow Y\) (Z is a collider)
  3. Consist of causal and non-causal paths:
    • A causal path is a path where all arrows point toward the causal outcome. Example: \(X \rightarrow M \rightarrow W \rightarrow T \rightarrow Y\) is a causal path from \(X\) to \(Y\).
    • A non-causal path is a path were at least one arrow point against the “causal flow”. Example: \(X \leftarrow M \leftarrow W \rightarrow T \rightarrow Y\) is a non-causal path between \(X\) and \(Y\) (\(W\) is a confounder of the relationship \(X\) and \(Y\), because it causes both, through the two “proxy” confounders \(M\) and \(T\))
  4. No arrow between two nodes (variables) is a STRONG causal claim


3.4 Cause-probing research designs

One way to categorize cause-probing research designs is by their relative strength in terms of internal validity. The strongest designs involve a distinct manipulation of the causal factor of interest and fully understood assignment of participants to conditions (assignment is ‘ignorable’). Slightly weaker designs still involve manipulation of the causal factor, but the assignment mechanism may be unclear, potentially introducing bias. The weakest designs neither manipulate the causal factor distinctly nor employ a known or controlled assignment mechanism.

The list below ranks designs from strongest to weakest, though this ranking is tentative. A well-executed study with a lower-ranked design may yield more reliable results than a poorly executed study with a higher-ranked design.

  • Experimental designs: Manipulation, Assignment mechanism known (“ignorable”)
    • Within-subject design
      • Targeting single-unit causal effects (Single-N design)
      • Targeting average causal effect
    • Between-subject design (randomized experiment)
    • Mixed within-between-subject design
  • Quasi-experimental designs: Manipulation, Assignment mechanism unclear
    • Natural experiment
    • (Regression discontinuity designs)
    • (Instrumental variable design)
    • (Difference-in-difference designs)
  • Non-experimental designs: No Manipulation, Assignment mechanism unclear
    • Longitudinal designs
      • Cohort study
      • Case-control study
    • Cross-sectional design

(Designs in parentheses are not covered in this course.)


Practice

The practice problems are labeled Easy (E), Medium (M), and Hard (H), as in McElreath (2020).

Easy

3E1. I overheard this conversation:
Holland: The fundamental problem is that you can never measure both. You’ll always have missing data.

Bellman: Nonsense, you just measure twice, once with and once without treatment.

Heraclitus: You could not step twice into the same river.

What was this all about?

Heraclitus

Footnote: This is an old exam question. Holland as in Holland (1986)


3E2. “With increasing traffic volumes, exposure to residential road-traffic noise has increased substantially over the last decades. In the same period, we have seen a remarkable reduction in the the number of heart attacks. Thus, road-traffic noise cannot cause heart attacks, as some noise researchers seem to suggest.”

What is wrong with this argument? Explain in terms of the Potential outcome model of causality.


3E3. “Every 60 years, two cycles within the Asian zodiac calender - one over twelve months and over over five elements - generate a year of the ‘fire horse’. A folk belief exists that families who give birth to babies designated as fire horses will suffer untold miseries, particularly so if the baby is a girl” (Morgan & Winship (2015), p. 65).

The figure below is adopted from Morgan & Winship’s Fig. 2.1.
Guess what year was a “fire horse”! (answer: 1966)

Make a reasonable estimate of the causal effect of the folk belief on the birth rate in year 1966. Define causal effect, ce, as

\(ce_i = y_i^1 - y_i^0\),

where \(y_i^1\) is the observed birth rate in year \(i=1966\) and \(y_i^0\) is the counterfactual birth rate in year 1966 had this year not been a fire horse.

Code
# Data
year <- 1951:1980
birth_rate <- c(25.3, 23.4, 21.5, 20.0, 19.4, 18.4, 17.2, 18.0, 17.5, 17.2,
                16.9, 17.0, 17.3, 17.7, 18.6, 13.7, 19.4, 18.6, 18.5, 18.8, 
                19.2, 19.3, 19.4, 18.6, 17.1, 16.3, 15.5, 14.9, 14.2, 13.6)

# Plot
plot(year, birth_rate, pch = '', ylim = c(12, 26), xlab = 'Year', 
     ylab = 'Birth rate (per 1,000 inhabitants)', las = 1, axes = FALSE )
axis(1, at = year, las = 1, tck = 0.01, cex.axis = 0.8) 
axis(2, at = seq(12, 26, by = 2), las = 1, tck = 0.01, cex.axis = 0.8)
lines(year, birth_rate)
points(year, birth_rate, pch = 21, bg = 'grey')

# Prepare adding birth-rate number below data symbols in plot
ytext <- birth_rate - 0.5

# Special treatment of data points 1, 15 and 17 (to avoid numbers on line)
ytext[c(1, 15, 17)] <- c(birth_rate[1], birth_rate[15] + 0.5, birth_rate[17] + 0.5)
xtext <- year
xtext[1] <- 1952

# Add birth_rate numbers 
text(xtext, ytext, birth_rate, cex = 0.5)

# Add gridlines
abline(v = year, col = "gray", lty = 3)


3E4. he effect estimated in 3E3. Would you call it an average causal effect or a single-unit causal effect. Motivate.


3E5. Draw Directed Acyclic Graphs (DAGs) involving three variables: X (exposure), Y (outcome), and Z (a third variable), to represent the following scenarios:

  1. X causes Y, and Z causes both X and Y.
  2. X causes Z, and Z causes Y.
  3. X and Y are independent, but both cause Z.
  4. Both X and Z cause Y.
  5. For each of the above, label the variable Z as a “confounder,” “mediator,” “collider,” or “competing exposure,” based on its role in the causal model.


3E6. In the DAG below, X is the exposure variable and Y is the outcome variable.

  1. Count the number of causal paths between X and Y.
  2. Count the number of non-causal paths (backdoors linking X and Y) .
  3. Sometimes, variables like Z3 are called proxy confounders. Why?
  4. On what path is Z3 a collider.
Code
e5dag <- dagitty( "dag {
   X -> Y
   X -> Z1 -> Y
   X <- Z2 -> Z3 -> Y
   Z2 -> Z3 -> Y
   Z3 <- Z4 -> Y
}")

coordinates(e5dag) <- list(
  x = c(X = 1, Z2 = 1, Z1 = 3, Z3 = 3, Z4 = 4, Y = 4),
  y = c(X = 5, Z2 = 1, Z1 = 4, Z3 = 2, Z4 = 1, Y = 5))

plot(e5dag)


Medium

3M1. Go back to the DAG in 3E6:

  1. Would the total average causal effect of X on Y be identified if you could adjust for Z2? (adjustment set: {Z2})
  2. Assume that Z2 is unmeasured and cannot be adjusted for, would it be sufficient to adjust for Z3 (adjustment set: {Z3})
  3. How about adjustment set {Z3, Z4}?


3M2.

  1. Think of an example in which you would expect all single-unit casual effects to be the same.
  2. Think of an example in which you would expect both positive and negative single-unit casual effects.
  3. Think of an example in which the average causal effect would be representative of most individuals.
  4. Think of an example in which the average causal effect would not be representative of any individual.


3M3. Are you better off than you were four years ago? This question became famous during the 1980 U.S. presidential campaign. It was used by Ronald Reagan during a debate against the incumbent president, Jimmy Carter. Now, Vice-president Kamala Harris faces the same question from former president Donald Trump and Harris seems to struggle finding a convincing response. How would you have answered? Apply your understanding of the potential outcome model of causality!


Hard

3H1.
Think of a simple scenario with one direct and one indirect causal effect of exposure X on outcome Y (see DAG below).

  1. Is it possible for the total causal effect to be less than the direct effect?
  2. Is it possible for the total causal effect to be close to zero despite a strong direct effect
  3. Is it possible for the total causal effect to have a different sign than the indirect causal effect
Code
h1dag <- dagitty( "dag {
   X -> Y
   X -> M -> Y
}")

coordinates(h1dag) <- list(
  x = c(X = 1, M = 2,  Y = 3),
  y = c(X = 1, M = 0,  Y = 1))

plot(h1dag)


3H2. Construct Directed Acyclic Graphs (DAGs) of the following scenarios.

  1. Z is a mediator in the pathway from X to Y, but Z is also influenced by variable W that affects both Z and Y.
  2. Both X and Z causes Y, and both X and Y are affected by a common cause W.
  3. X causes W that causes both Z and Y.
  4. For each of the above, discuss how conditioning on Z would potentially bias or unbias the estimate of the causal effect of X on Y.


3H3.

  1. Simulate a data set with observations from 100 individuals on three variables X, Y, and Z that is consistent with this DAG: \(X \leftarrow Z \rightarrow Y\)

  2. Use statistical analyses to check the degree of association between X and Y,

  3. … and between X and Y after controlling for Z.


(Session Info)

Code
sessionInfo()
R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default


locale:
[1] LC_COLLATE=Swedish_Sweden.utf8  LC_CTYPE=Swedish_Sweden.utf8   
[3] LC_MONETARY=Swedish_Sweden.utf8 LC_NUMERIC=C                   
[5] LC_TIME=Swedish_Sweden.utf8    

time zone: Europe/Stockholm
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dagitty_0.3-4

loaded via a namespace (and not attached):
 [1] digest_0.6.37     fastmap_1.2.0     xfun_0.52         knitr_1.50       
 [5] htmltools_0.5.8.1 rmarkdown_2.29    cli_3.6.5         compiler_4.4.2   
 [9] boot_1.3-31       rstudioapi_0.17.1 tools_4.4.2       curl_6.4.0       
[13] evaluate_1.0.3    Rcpp_1.0.14       yaml_2.3.10       rlang_1.1.6      
[17] jsonlite_2.0.0    V8_6.0.4          htmlwidgets_1.6.4 MASS_7.3-61