# Observational Studies, Confounders, and Stratification

Unlike controlled experiments, in observational studies, the researcher has NO control over assignment into treatment and control groups. In observational studies, oftentimes either

• The subjects themselves decide if they get the treatment or not.
OR
• Fate determines who gets the treatment and who does not.

In both cases, the researcher just observes what happens.

## Main Problem with Observational Studies

Observational studies are done out of necessity. Whenever possible, it’s better to do a randomized controlled double-blind experiment. Why is this the case?
They can show correlation, but they do not imply causation.

Observational studies can show an association, but it’s difficult to make conclusions about causality. Since the treatment and control groups just "happened" they are often very different from each other.

### Confounding Variables

The differences between the treatment and control groups are called confounders. Confounders can mix up the results of a study when you try to reach a conclusion. They are very common in observational studies. It’s good to be aware of possible confounders so that you don’t lie with statistics.

In observation studies, there are 2 scenarios:

1. The treatment truly caused the response. If so, there will be a causal link explaining how or why the treatment itself is causing the response. There is a direct link between the treatment and response as shown below.

2. The treatment did not cause the response. Instead, there is a confounder that’s related to the treatment that is also causing the response. It is making it look like the treatment caused the response when in reality, the confounder caused the response.

## How To Handle Confounding Variables

Because observational studies are done out of necessity, sometimes, it’s not possible to do a randomized controlled experiment. The question is, can we still make conclusions from an observational study?
Good studies take great care to reduce confounding and there are many ways to do this! One common way is through a technique called stratification.

### Stratification

Statisticians adjust for these confounding variables by dividing the treatment and control groups into smaller more homogeneous subgroups, where the confounding factor is the same. This is called stratification. Stratification plays a similar role in observational studies as blocking does in randomized experiments and stratification helps us deal with confounders.

With stratification, we can compare groups that are similar in the treatment group to groups that are similar in the control group. If you think a variable could confound your results, you should stratify on that variable. Here is a visual below:

With observational studies, you need a much bigger sample size than you do with randomized experiments because each time you stratify for a possible confounder the comparison groups get smaller and smaller. You can stratify as many times as needed.

# Example Walk-Throughs with Worksheets

### Video 1: Observational Studies Examples

Follow along with the workseet to work through the problem:

### Video 2: Experimental Design Examples

Follow along with the workseet to work through the problem:

### Video 3: Stratification Examples

Follow along with the workseet to work through the problem:

# Practice Questions

Q1: Some observational studies show that people who drink energy drinks tend to get hurt more often. State whether the following is a confounder, causal link, neither, or both: Genetics - Some people are more genetically prone to injuries than others.
Q2: Some observational studies show that people who drink energy drinks tend to get hurt more often. State whether the following is a confounder, causal link, neither, or both: Harmful ingredients - Energy drinks have harmful ingredients that decrease white blood cells, which can lead to injuries.
Q3: Some observational studies show that people who drink energy drinks tend to get hurt more often. State whether the following is a confounder, causal link, neither, or both: Exercise - People who exercise are more likely to drink energy drinks and exercising makes people more prone to injuries.
Q4: Suppose we want to do an observational study to test the effectiveness of a drug that is supposed to improve focus. There are 30 subjects in this observational study, however, 20 subjects work from home and 10 work in the office. We think this could affect the response. How would you design this study and use stratification to compare the treatment and control groups at the end?
Q5: An observational study is best defined as:
Q6: Stratification is done at:

# Mastery-Based Assessment

A mastery-based assessment is available for Observational Studies, Confounders, and Stratification:
1. Access PrairieLearn (prairielearn.org)
2. Complete the `m1-05` Observational Studies, Confounders, and Stratification mastery assessment on PrairieLearn
3. Continue to master material and earn 100% mastery on all assessments in the "Basics of Data Science with Python" section to earn the Basics of Data Science with Python Mastery Badge!