Python Functions for Random Distributions

← Bernoulli & Binomial Random Variables Next: Central Limit Theorem →

The scipy.stats library in Python provides us the ability to represent random distributions using Python! The library has dozens of distributions, including all commonly used distributions. Three extremely common distributions are normal, bernoulli and binomial distributions:

Distribution Python Code

Normal Distribution

Distribution	Python Code
Normal Distribution	`from scipy.stats import norm N = norm() # Normal Distribution`
Bernoulli Distribution	`from scipy.stats import bernoulli B = bernoulli(p=0.2) # Bernoulli Distribution with p=0.2`
Binomial Distribution	`from scipy.stats import binom D = binom(p=0.1, n=50) # Binomial Distribution with p=0.1, n=50`

from scipy.stats import norm
N = norm()   # Normal Distribution

Bernoulli Distribution

from scipy.stats import bernoulli
B = bernoulli(p=0.2)   # Bernoulli Distribution with p=0.2

Binomial Distribution

from scipy.stats import binom
D = binom(p=0.1, n=50)   # Binomial Distribution with p=0.1, n=50

Once you have a variable with a distribution, there are many Python functions we can use to preform calculations with the distribution. The functions are the same no matter what distribution you have -- so let's discovery them via examples!

Example Binomial Distribution

A simple binomial distribution that is easy to understand is a binomial distribution with n=2 and p=0.5 (two events, each with a 50% chance of success, like flipping a coin two times and finding out how many times we get heads). To create this distribution in Python:

from scipy.stats import binom
COIN = binom(n=2, p=0.5)

There are four possible outcomes -- HH, HT, TH, and TT. The binomial distribution models these outcomes:

There is a 25% probability of the outcome having zero heads (TT). This is represented when COIN returns the value 0 (zero heads).
There is a 50% probability of the outcome having exactly one head (TH or HT). This is represented when COIN returns the value 1 (exactly one head).
There is a 25% probability of the outcome having two heads (HH). This is represented when COIN returns the value 2 (exactly two heads).

We can represent this distribution as a table and a graph:

x	P( COIN == x )
0 zero heads	0.25
1 one head	0.5
2 two heads	0.25

CDF: Cumulative Distribution Function

The Cumulative Distribution Function or CDF is:

The probability of all outcomes less than or equal to a given value x,
Graphically, this is the the total area of everything less than or equal to x (the total area of the left of x)

Using our two-coin flip example where COIN = binom(n=2, p=0.5), the CDF functions are asking the following:

COIN.cdf(0.2) asks "what percentage of results have 0.2 or fewer heads?"
COIN.cdf(1) asks "what percentage of results have 1 or fewer heads?"
COIN.cdf(2) asks "what percentage of results have 2 or fewer heads?"

Python Example 1: `COIN.cdf(0.2)`

While it's a bit strange to ask "what percentage of results have 0.2 or fewer heads?" since we cannot get a partial number of heads, but it's easy to calculate that the only number of heads that is equal to or less than 0.2 is getting zero heads. Since this only happens one our of four times, we expect the result to be 25%.

Running the code in Python:

COIN.cdf(0.2)

0.25

The cdf(0.2) of COIN is 0.25 (25%). This can be interpreted that 25% of results will have 0.2 or fewer heads.

Python Example 2: `COIN.cdf(1)`

Similar to the first example, "what percentage of results have 1 or fewer heads?" In this case, we can have either zero heads or one head. Since three of our four outcomes have zer or one heads (TT, TH, and HT), the CDF should be 3/4 or 75%. Let's check with Python:

COIN.cdf(1)

0.75

The cdf(1) of COIN is 0.75 (75%). This can be interpreted that 75% of results will have 1 or fewer heads.

Python Example 3: `COIN.cdf(2)`

Similar to the first two examples, "what percentage of results have 2 or fewer heads?" In this case, we can have either zero heads, one head or two heads -- that is every possible result! This means there should be a 100% chance of having two or fewer heads. Checking with Python:

COIN.cdf(2)

The cdf(2) of COIN is 1 (100%). This can be interpreted that 100% of results will have 2 or fewer heads.

PPF: Probability Point Function

The Probability Point Function or PPF is the inverse of the CDF. Specifically, the PPF returns the exact point where the probability of everything to the left is equal to y. This can be thought of as the percentile function since the PPF tells us the value of a given percentile of the data.

COIN.ppf(0.2) asks "what is the 20%-tile of heads?
COIN.ppf(0.6) asks "what is the 60%-tile of heads?
COIN.ppf(0.99) asks "what is the 99%-tile of heads?

Examples

Examining the distribution for COIN, we can calculate the percentiles for each number of heads:

x	P( COIN == x )	Percentile Range
0 zero heads	0.25	0% - 25%
1 one head	0.5	25% - 75%
2 two heads	0.25	75% - 100%

Therefore, we expect that:

COIN.ppf(0.2), the 20%-tile, falls within 0 heads, and we expect the output to be 0.
COIN.ppf(0.6), the 60%-tile, falls within 1 head, and we expect the output to be 1.
COIN.ppf(0.99), the 99%-tile, falls within 2 heads, and we expect the output to be 2.

Verifying with Python:

print( COIN.ppf(0.2) )
print( COIN.ppf(0.6) )
print( COIN.ppf(0.99) )

0
1
2

The output of the Probability Point Function (PPF) on our COIN random variable.

PDF / PMF: Probability {Density/Mass} Functions

The .pmf() and .pdf() functions find the probability of an event at a specific point in the distribution.

The Probability Mass Function (PMF) -- or .pmf() -- is only defined on discrete distributions where each event has a fixed probability of occurring.
The Probability Density Function (PDF) -- or .pdf() -- is only defined on continuous distributions where it finds the probability of an event occurring within a window around a specific point.

Probability Mass Function (PMF)

Earlier, we discussed that the probability of zero heads is 25% in our COIN binomial random variable. Therefore, we expect that COIN.pmf(0) should be 0.25:

COIN.pmf(0)

0.25

The pmf(0) of COIN is 0.25. This tells us that the probability of return zero from our random variable is 25%.

Likewise, we expect pmf(1) to be 50% (for the 50% chance of flipping exactly one head) and pmf(2) to be 25% (for the 25% chance of flipping two heads):

print( COIN.pmf(0) )
print( COIN.pmf(1) )
print( COIN.pmf(2) )

0.25
0.5
0.25

The probability mass function of COIN.

RVS: Random Value Sample

The .rvs() function returns a random sample of the distribution with probability equal to the distribution -- if something is 80% likely, that value will be sampled 80% of the time. In COIN, we expect more results with 1 (50% occurrence of 1 head) than 0 or 2 (25% occurrence of either zero heads or two heads).

Generating a sample of 50 values:

COIN.rvs(50)

array([2, 1, 2, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 2, 2, 0, 0, 1, 1, 0, 1, 2,
1, 0, 2, 1, 1, 1, 1, 1, 1, 1, 0, 2, 0, 1, 1, 2, 2, 2, 0, 1, 2, 1,
1, 1, 1, 1, 1, 1])

Fifty random values sampled from COIN.

We can insert this data into a DataFrame and count the number of occurrences:

df = pd.DataFrame( COIN.rvs(50) )
print(f"Zero Heads: {len( df[ df[0] == 0 ] )}")
print(f"One Head: {len( df[ df[0] == 1 ] )}")
print(f"Two Heads: {len( df[ df[0] == 2 ] )}")

Zero Heads: 10
One Head: 29
Two Heads: 11

Counts of the fifty random values sampled from COIN.

In this small simulation, we observe far more results of 1 than 0 or 2. This is the expected result discussed earlier.

Example Walk-Throughs with Worksheets