Random Variables

← Clustering Next: Bernoulli & Binomial Random Variables →

Data scientists use the term random variable for variables whose numeric values are based on the outcome of a random process. The domain of a random variable is the set of all possible outcomes. Each outcome has a probability associated with it.

For example: We can create a simulation for counting the number of 1's that appear when we roll a fair, six-sided die 100 times.

The random variable would be the number of 1's that appear. This is what changes each time we repeat the process of rolling a die 100 times.

Discrete vs. Continuous Random Variables

Random variables can be discrete or continuous. Discrete random variables have a countable number of possible values. In other words, these are random variables that are whole numbers.

For example: number of pets you own, the number of people in attendance at an Illinois football game

Continuous random variables are random variables where the data can take infinitely many values. In other words, these are random variables that can have decimals.

For example: the time it takes to run a mile, interest rate, the weight of your pet

Mean and Variance of Discrete Random Variables

Previously in DISCOVERY, we summarized a list of numbers by computing their average and SD. Now we’ll do the analogous summaries for random variables, in other words, we will look at the average and standard deviation of numbers generated by a chance process.

The mean of a random variable is also known as the expected value (commonly represented as EV). This is what we are expected to get when we repeat a chance process over and over again. The expected value of a discrete random variable X is shown by:

The standard deviation of a random variable as the standard error (commonly represented as SE). The SE measures the spread are the expected value. The SE of a discrete random variable X is shown by:

Lastly, we can also make a histogram of a random variable. Typically, these histograms show all possible outcomes of a chance process and their probabilities. This type of histogram is known as a probability histogram. We can create this in Python using df.hist().

Practice Questions

Q1: Which of the following is a discrete random variable?

Q2: What is the expected value of a fair six sided die?

Q3: What is another name for the expected value of a random variable?

Q4: Which type of histogram shows all possible outcomes of a chance process and their probabilities?

Q5: Standard error measures:

← Clustering Next: Bernoulli & Binomial Random Variables →