Law of Large Numbers


The law of large numbers is a powerful law in Data Science that states our average result will tend towards the expected value the more simulations (or trials) we run. This law is incredibly valuable and is the key idea behind why we can use simulation to predict the outcomes of complex events.

We can see the law of large numbers in action through simulating a simple game where someone randomly chooses a number 1, 99 or 101 (just those three numbers!).

  • The expected average we can calculate mathematically: (1 + 99 + 101) = 201. 201 / 3 = 67.
  • The law of large number tells us that the average of our simulations will tend closer to 67 the longer we run our simulation.

To test this, we'll run a simulation a total of 1,000,000 times and then analyze the average when including only the first 10 simulations, the first 100, the first 1,000, the first 10,000, and then all 1,000,000.

Simulation

data = []
for i in range(1000000):
  num = random.choice([1, 99, 101])
  d = { "num": num }
  data.append(d)
df = pd.DataFrame(data)
df
num
099
11
2101
399
499
......
999995101
999996101
999997101
99999899
99999999

Simulation of 1,000,000 games randomly choosing a number from the set {1, 99, 101}.

First 10 Simulations

The following code finds the mean (average) of only the first 10 runs of the simulation:

df[0:10]["num"].mean()
90

The mean value for num of the first ten runs of the simulation.

Knowing that the expected value is 67, we can calculate the error as being off by 23! Having so few simulations (just 10) does not provide a very accurate approximation and this is why we're always going to run thousands of simulations!

First 100 Simulations

df[0:100]["num"].mean()
73.18

The mean value for num of the first 100 runs of the simulation.

With an additional 90 simulations, the average is tending towards the expected value with an error of only 6.18. This is still nearly 10% of the expected value and quite a bit of error.

First 1,000 Simulations

df[0:1000]["num"].mean()
68.01

The mean value for num of the first 1,000 runs of the simulation.

With an additional 900 simulations, the average continues to tend towards the expected value of 67 -- now with an error of just 1.01. The error has been reduced to less than 2% from the expected value.

First 10,000 Simulations

df[0:10000]["num"].mean()
66.8696

The mean value for num of the first 10,000 runs of the simulation.

With 9,000 additional simulations, the error from the expected value is now just 0.1304 and just a tiny fraction of what it was with just 10 or 100 simulations.

All 1,000,000 Simulations

df["num"].mean()
67.060252

The mean value for num using the entire simulation.

After 1,000,000 simulations, the average value is incredibly close to the expected value (simulation error of 0.060252). In this example, you saw the law of large numbers play out across through the simulation!


Example Walk-Throughs with Worksheets

Video 1: Discovering the Law of Large Numbers

Follow along with the worksheet to work through the problem:

Practice Questions

Q1: The expected value of rolling a die is 3.5. If we simulate 10,000 rolls, what would we expect the cumulative average value of the rolls to be?
Q2: Which descriptive statistic is used to find the Expected Value?
Q3: Of the following, how many simulations will give us the closest cumulative average of the simluations to the expected value?
Q4: How would we find the cumulative average?
Q5: In order to find the cumulative summation of every simluation we made so far, what function should we use?