Python Functions for Bernoulli and Binomial Distribution


In python, the scipy.stats library provides us the ability to represent random distributions, including both the Bernoulli and Binomial distributions. In this guide, we will explore the expected value, cumulative distribution function (CDF), probability point function (PPF), and probability mass function (PMF) of these distributions.
Recall:

  • Expected Value (EV): the expected value, or mean, of a random variable
    • For a Binomial random variable, EV = np
  • Cumulative Distribution Function (CDF): the probability of all outcomes less than or equal to a given value x
  • Probability Point Function (PPF): the exact point where the probability of everything to left is equal to y, also known as the percentile function
  • Probability Mass Function (PMF): the probability of an event at a specific point in the distribution defined on discrete distributions

Bernoulli Distribution

Recall that a Bernoulli random variable is any event that has exactly two outcomes (success or failure) with a fixed probability that is repeated only once. This distribution has a probability, p describing the probability of the event occurring.

To explore the Bernoulli distribution in python, we will be using a hypothetical lottery ticket with a 10% chance of winning:

#import scipy.stats library
from scipy.stats import bernoulli
#the lottery ticket is a bernoulli random variable with p=0.1
lottery = bernoulli(p=0.1)
Setting up our Bernoulli random variable: lottery

Finding the Expected Value (EV)

Statistically, the expected value is the for a bernoulli random variable simply the probability of the event occurring, p, or 0.1.

# Expected Value:
ev = lottery.expect()  
print(f"Expected value: {ev}")
Expected value: 0.1
Using scipy.stats to explore our Bernoulli random variable, lottery

Finding the Cumulative Distribution Function (CDF) at X=1

Statistically, the CDF at X=1 is the total probability of all events up to a certain point. Since a bernoulli random variable has only two possible events (zero wins X=0 or one win X=1), the CDF of X=1 will be equal to the sum of the probability of X=0 and X=1.

  • P(X = 0) = P(failure) = 90%

  • P(X = 1) = P(success) = 10%

  • CDF(X=1)
    = P(X ≤ 1)
    = P(X = 0) + P(X = 1)
    = 0.9 + 0.1
    = 1.0

# Cumulative Distribution Function (CDF) at X=1
cdf1 = lottery.cdf(1)
print(f"CDF at X=1: {cdf1}")
CDF at X=1: 1.0
Using scipy.stats to explore our Bernoulli random variable, lottery

Finding the Probability Point Function (PPF) at X=0.75

Statistically, the PPF at X=0.75 is the value of the distribution at the 75%-tile. The percentile range for this distribution is as follows:

xP(X = x)Percentile Range
0
zero successes
0.90% - 90%
1
one success
0.190% - 100%

The PDF X=0.75 is 0 wins (0) since the 75%-tile is in the zero wins percentile range.

# Probability Point Function (PPF) at X=0.75
ppf75 = lottery.ppf(0.75)
print(f"75th Percentile: {ppf75}")
75th Percentile: 0.0
Using scipy.stats to explore our Bernoulli random variable, lottery

Finding the Probability Mass Function (PMF) at X=0.75

Statistically, the PMF at X=1 is the value percentage chance of one X=1 success occurring. Examining the distribution:

xP(X = x)Percentile Range
0
zero successes
0.90% - 90%
1
one success
0.190% - 100%

The PMF X=1 is 10% (0.1) since the chance of having exactly one success is 10% 0.1.

# Probability Mass Function (PMF) at X=0.75
pmf1 = lottery.pmf(1)
print(f"PMF at X=1: {pmf1}")
PMF at X=1: 0.1
Using scipy.stats to explore our Bernoulli random variable, lottery

Binomial Distribution

Recall that a Binomial random variable is any event that has exactly two outcomes (success or failure) with a fixed probability that is repeated n times. This distribution has a probability, p describing the probability of the event occurring each trial. The Bernoulli random variable is a special case of Binomial where n=1.

To explore the Bernoulli distribution in python, we will continue with our lottery ticket, however, we will explore what happens if we buy 4 lottery tickets with a 10% chance of winning:

#import scipy.stats library
from scipy.stats import binom
#the lottery ticket is a binomial random variable with p=0.1 and n=4
lottery = binom(p=0.1, n=4)
Setting up our Binomial random variable: lottery

Finding the Expected Value (EV)

Statistically, the expected value is np, the number of trials multiplied by the probability of the event occurring, to find the expected number of wins when we buy 4 lottery tickets.
np = 4*0.1 = 0.04

The expected number of lottery ticket wins is 0.04.

# Expected Value:
ev = lottery.expect()  
print(f"Expected value: {ev}")
Expected value: 0.39999999999999997
Using scipy.stats to explore our Bernoulli random variable, lottery

Finding the Cumulative Distribution Function (CDF) at X=2

Statistically, the CDF at X=2 is the total probability of all events up to a certain point. Since we want to find the CDF at X=2, we want to find the probability of all events up to 2. Therefore, we want to find the total probability of zero wins X=0, 1 win X=1, and 2 wins X=2.

Recall for a Binomial Random Variable:
P(X=x) = (nCx)(p)x(1-p)n-x

  • P(X = 0) = (4C0)(0.1)0(1-0.1)4 = 0.6561 (65.61%)
  • P(X = 1) = (4C1)(0.1)1(1-0.1)3 = 0.2916 (29.16%)
  • P(X = 2) = (4C2)(0.1)2(1-0.1)2 = 0.0486 (4.86%)
  • CDF(X=2)
    = P(X ≤ 2)
    = P(X = 0) + P(X = 1) + P(X = 2)
    = 0.6561 + 0.2916 + 0.0486
    = 0.9963

The probability of having less than or equal to 2 winning lottery tickets out of our 4 lottery tickets is 0.9963 (99.63%).

# Cumulative Distribution Function (CDF) at X=1
cdf = lottery.cdf(2)
print(f"CDF at X=2: {cdf}")
CDF at X=2: 0.9963
Using scipy.stats to explore our Bernoulli random variable, lottery

Finding the Probability Point Function (PPF) at X=0.75

Statistically, the PPF at X=0.75 is the value of the distribution at the 75%-tile. The percentile range for this distribution is as follows:

xP(lottery = x)Percentile Range
0
zero wins
0.65610% - 65.61%
1
one win
0.291665.61% - 94.77%
2
two wins
0.048694.77% - 99.63%
3
three wins
0.003699.63% - 99.99%
4
four wins
0.000199.99% - 100%

The PDF X=0.75 is 1 win (1) since the 75%-tile is in the 1 win percentile range.

# Probability Point Function (PPF) at X=0.75
ppf75 = lottery.ppf(0.75)
print(f"75th Percentile: {ppf75}")
75th Percentile: 1.0
Using scipy.stats to explore our Bernoulli random variable, lottery

Finding the Probability Mass Function (PMF) at X=0.75

Statistically, the PMF at X=2 is the value percentage chance of one X=2 successes occurring. Examining the distribution:

xP(lottery = x)Percentile Range
0
zero wins
0.65610% - 65.61%
1
one win
0.291665.61% - 94.77%
2
two wins
0.048694.77% - 99.63%
3
three wins
0.003699.63% - 99.99%
4
four wins
0.000199.99% - 100%

The PMF X=2 is 4.86% (0.0486) since the chance of having exactly two successes is 4.86% 0.0486.

We call also use the probability mass function of a binomial distribution to find PMF X=2.

Recall for a Binomial Random Variable:
P(X=x) = (nCx)(p)x(1-p)n-x

  • P(X = 2) = (4C2)(0.1)2(1-0.1)2 = 0.0486 (4.86%)

Again, we see the PMF X=2 is 4.86% (0.0486) since the chance of having exactly two successes is 4.86% 0.0486.

# Probability Mass Function (PMF) at X=0.75
pmf = lottery.pmf(2)
print(f"PMF at X=2: {pmf}")
PMF at X=2: 0.0486
Using scipy.stats to explore our Bernoulli random variable, lottery

A detailed explanation of the Binomial Distribution is part of the DISCOVERY course content: