Overview of Simulation

← Basic Data Visualization in Python Next: Random Numbers in Python →

Simulation is one of the most important aspects of Data Science as it allows us to find answers to questions that we may not have the skills, understanding, or certainty to solve mathematically. Everything from developing new medications, launching rockets, prototyping fashion trends, and more are simulated every day to speed up the production and safety of almost every industry!

Creating a Simulation

To create a simulation, we must identify every real-world factor that has an effect on the outcome. For a complex event, there are millions of individual factors that effect the results (ex: launching a rocket involves the temperature of the fuel, the pressure of the fuel, the temperature outside, the humidity, and millions of other individual factors). However, we can begin to understand simulation by starting to simulating simple events.

Simulation of a Die Roll

An extremely simple simulation is the simulation of rolling a six-sided die. If we assume the die is completely fair -- that it is equally likely to land on each side -- then the only factor is which side lands face up.

A single simulation is simply randomly choosing a from [1, 2, 3, 4, 5, 6]. When you learned about Random Numbers in Python, you learned we can do this in Python using random.randint():

# Randomly generates a number in the range 1-6, including the end points:\nrandom.randint(1, 6)

Reset Code Python Output:

Multiple Simulations

A single simulation will reveal one possible result, but simulations become really when we run simulations many times so that we understand all possible results and the expected likelihood of each result.

When I ran the random.randint(1, 6) 600 times, Python provided me the following results:

98 times we rolled a 1,
97 times we rolled a 2,
104 times we rolled a 3,
98 times we rolled a 4,
101 times we rolled a 5,
102 times we rolled a 6.

This distribution -- the likelihood of each result -- shows us that is is approximately equally likely for each of the six sides to show up. (In the very next section, "For Loops in Python", you'll learn the Python technique to repeat block of code many times to simulate this yourself!)

Simulation Error

Unlike calculating probabilities mathematically, simulation will always result in some error since we sample (simulate) only a fixed number of events. In the example where we rolled a six-sided die 600 times, mathematically we know the expected value of rolling any specific side is exactly 100. In our simulation, the distribution nearly matched the expected value but had some error (ex: we rolled a "1" 98 times instead of 100).

In general, the error of a simulation is effected by the the number of simulations and the likelihood of the result:

The more likely a result is to occur, the smaller the simulation error. (We usually can't control this.)
The more simulations we preform, the smaller the simulation error. (We can control this!)

In general, you want to simulate as many times as reasonably possible to get the most accurate result. Later you will learn that the Central Limit Theorem tells us that the error decreases proportional to the square root of the number of simulations. This means that you need four times as many simulations to result in just half the error.

To half the simulation error of our 600 rolls, we would need to roll a total of 2,400 die (1,800 more die rolls!).
To half the simulation error again, we would need to roll a total of 9,600 die!
This means we would need to roll 9,000 more die just to reduce the accuracy of our initial 600 die rolls by 4x.

Key Takeaway: Simulation is Always an Estimation

Remember, you will ALWAYS get a more accurate result working the probability out mathematically. However, it is often difficult or impossible to do and we need to rely on simulation -- and we can even use simulations to check our math!

Practice Questions

Q1: To half the simulation error of 4 die rolls, we would need 12 more die rolls. To half the simulation error again, we would need how many additional die rolls?

Q2: To half the simulation error of 10 rolls, how many more die would you need to roll?

Q3: How can we minimize simulation error?

← Basic Data Visualization in Python Next: Random Numbers in Python →