12  Random Variables

In many real-world situations, from rolling a die to measuring rainfall, we deal with outcomes that are inherently uncertain. To make sense of this uncertainty, we use random variables: mathematical tools that assign numerical values to the outcomes of random processes. They allow us to translate chance into structure, giving us a way to analyze and predict patterns in uncertain environments.

This section introduces the concept of random variables and the probability distributions that describe their behavior. A probability distribution not only lists the possible values a random variable can take, but also tells us how likely each value is to occur. Whether the variable counts discrete outcomes or measures continuous quantities, understanding its distribution is key to interpreting the role of randomness in data, decisions, and models.

12.1 What is a Random Variable?

In statistics, sampling should always be done randomly. This means that before the sampling occurs, we do not know who will be selected; it’s a random experiment where all individuals in the population are possible outcomes.

To introduce the idea of a random variable (also called a stochastic variable), imagine a small population of six individuals: Anna, Bob, Carol, David, Erin, Finn. This is shown in Figure 12.1 (a).

We are not primarily interested in the individuals themselves, but in the values they hold on some variable. For example, suppose we want to study whether each person has a driver’s license. Let’s assign a value of 1 to those who have a license and 0 to those who don’t. Now, instead of thinking about names, our outcome space transforms into a set of 1’s and 0’s, depending on the presence or absence of a driver’s license. This is shown in Figure 12.1 (b).

(a) Sample space with names.
(b) Sample space with coded variable of interest.
Figure 12.1: Random variables in sample space.

In probability theory, we often want to describe a random experiment using a variable before the experiment is carried out. Because we do not know in advance what the outcome will be, we use a random variable to represent the result. This variable is typically denoted by a capital letter, such as \(X\), \(Y\), or \(Z\), often chosen from the end of the alphabet.

For example, we might define the variable \(X\) as:

\(X\) = Number of people with a driver’s license

This variable can take on different values depending on who is selected during the sampling process. By grouping all individuals who share the same outcome value for our variable, we divide the outcome space into disjoint events. Suppose we define the event \(A\) as: “A randomly selected individual has a driver’s license.”

This allows us to split the outcome space into two distinct parts: those with \(X = 1\) (event \(A\)), and those with \(X = 0\) (event \(\bar{A}\)), as shown in Figure 12.2.

Figure 12.2: Outcome space divided into disjoint events.

If three of the six individuals in the population have a license, then the probability of selecting a person with a license is:

\[ P(X = 1) = \frac{3}{6} = 0.5 \]

And the probability of selecting someone without a license is:

\[ P(X = 0) = \frac{3}{6} = 0.5 \]

This defines the probability distribution of the variable \(X\). Note that we can construct an infinite number of random variables for the same random experiment. For instance, if we use the same group of six people, we could define a new variable \(X\) to represent vaccination instead of license status. Then we would need to specify what values the variable takes (e.g., 1 = vaccinated, 0 = not vaccinated), define the corresponding event \(A\), and determine the resulting probability distribution for this new version of \(X\).

Once the random event has occurred and the experiment is completed, the stochastic variable will have assumed a specific value. At that point, we switch notation and use lowercase letters (such as \(x\), \(y\), or \(z\)) to refer to the observed values. We write \(P(x) = P(X = x)\), which defines the probability function for the variable \(X\). This function tells us the likelihood that \(X\) takes on specific values \(x\).

In summary, a random variable is a numerical variable whose value is determined by the outcome of a random experiment. Before the experiment is performed, we don’t know what value the variable will take, but we do know the set of possible values it may assume. Once the experiment is conducted, the variable takes on a definite value, and we can analyze its behavior using its probability distribution.

We use uppercase letters like \(X\) to denote random variables, and the corresponding lowercase letters (like \(x\)) to represent actual outcomes. Depending on the context, random variables may be discrete or continuous, and each has its own type of probability model.

12.2 Discrete vs. Continuous Random Variables

A discrete stochastic variable is one that can only take on a countable number of values. This might be a finite set, or it could be an infinite but countable one. Common examples include:

  • The number of dots shown when rolling a die
  • The sum of two dice rolls
  • The number of trials needed to roll a six
  • The number of girls in a randomly selected family with three children

On the other hand, a continuous stochastic variable can take on any value within an interval of the real number line. These variables are useful for modeling quantities that are measured rather than counted. Some examples include:

  • The length of a randomly selected newborn child
  • The lifespan of a randomly chosen light bulb
  • The annual income of a randomly chosen household

12.3 Expected Value and Variance

A random variable’s probability distribution can, much like an empirical data set, be described using measures of central tendency and spread.

The expected value of a random variable, often denoted as \(E(X)\) or \(\mu\), represents the long-run average outcome of a random process if it were repeated many times. It’s a kind of “center of gravity” for the distribution — where the outcomes tend to balance out.

While the expected value tells us the average, the variance of a random variable, denoted \(\operatorname{Var}(X)\), tells us how spread out the values are around the mean. A small variance means values cluster tightly around the mean; a large variance means they are more widely scattered.

So even if we don’t yet know the outcome of a random process, we can describe how it typically behaves: where it centers (expected value) and how much it varies (standard deviation). This way, we can summarize the behavior of randomness in both location and consistency — a powerful way to make sense of uncertainty. We explore these concepts further in the next chapter.

12.4 Linear Transformations of Random Variables

Suppose we know the expected value and variance of a random variable \(X\), and we define a new variable \(Y\) as a linear transformation. FOr example, maybe you’re adding a flat fee to a cost, or scaling everything up because of inflation.

Mathematically, you’ve created:

\[ Y = a + bX \]

Now you’re probably wondering: How does this change the average and the spread? In other words, we are interested in how this affects the expectation and variance. The following rules apply:

  • The expected value of \(Y\) is:

\[ E(Y) = E(a + bX) = a + bE(X) \]

  • The variance of \(Y\) is:

\[ \operatorname{Var}(Y) = \operatorname{Var}(a + bX) = b^2 \operatorname{Var}(X) \]

Adding a constant (\(a\)) does not affect variability, but scaling by a constant (\(b\)) affects the spread by a factor of \(b^2\).

If \(c\) is a constant (i.e., not a random variable):

  • The expected value of a constant is the constant itself:

\[ E(c) = c \]

  • The variance of a constant is zero:

\[ \operatorname{Var}(c) = 0 \]

Why zero variance? Because there’s nothing to vary — it’s always the same.

We will see a practiccal example of these rukes of in the next chapter.

12.5 Standardizing a Random Variable

To compare random variables on a common scale, we often standardize them. Given a random variable \(X\) with expected value \(\mu_X\) and standard deviation \(\sigma_X\), we define:

\[ Z = \frac{X - \mu_X}{\sigma_X} \]

This transformation creates a standardized variable \(Z\) with mean derived as follows using the rules shown in Section 12.4.

\[ E(Z) = E\left( \frac{X - \mu_X}{\sigma_X} \right) \]

Since \(\frac{1}{\sigma_X}\) is a constant, we can factor it out:

\[ E(Z) = \frac{1}{\sigma_X} \cdot E(X - \mu_X) \]

Using the linearity of expectation:

\[ E(X - \mu_X) = E(X) - \mu_X = \mu_X - \mu_X = 0 \]

So:

\[ E(Z) = \frac{1}{\sigma_X} \cdot 0 = 0 \]

Thus, the standardized variable \(Z\) has mean 0. \[ E(Z) = 0 \]

Now we derive the variance of \(Z\):

\[ \operatorname{Var}(Z) = \operatorname{Var}\left( \frac{X - \mu_X}{\sigma_X} \right) \]

We apply the rule for scaling a random variable:

\[ \operatorname{Var}\left( \frac{X - \mu_X}{\sigma_X} \right) = \left( \frac{1}{\sigma_X} \right)^2 \cdot \operatorname{Var}(X - \mu_X) \]

Since subtracting a constant does not change variance:

\[ \operatorname{Var}(X - \mu_X) = \operatorname{Var}(X) = \sigma_X^2 \]

So:

\[ \operatorname{Var}(Z) = \frac{1}{\sigma_X^2} \cdot \sigma_X^2 = 1 \]

Thus, the standardized variable \(Z\) has variance 1.

\[ \operatorname{Var}(Z) = 1 \]

To summarize, by standardizing a random variable using

\[ Z = \frac{X - \mu_X}{\sigma_X} \]

we obtained a new variable \(Z\) with the following properties:

  • \(E(Z) = 0\)
  • \(\operatorname{Var}(Z) = 1\)

This makes \(Z\) easier to work with and allows comparisons between different variables, regardless of their original scale or units.

Standardized variables are especially useful when:

  • Comparing two variables measured in different units
  • Working with standard distributions (like the standard normal)
  • Simplifying statistical formulas and derivations

Standardization ensures that the new variable \(Z\) has no unit, zero mean, and unit variance — a common baseline in probability and statistics.

12.6 Exercises

For each of the random variables below, decide whether it is discrete or continuous:

  1. The number of books a student checked out from the library this week
  2. The time (in minutes) it takes a person to run 5 kilometers
  3. The number of heads in 10 coin tosses
  4. A person’s height (in centimeters)
  5. The number of emails received in one day
  6. The daily average temperature in a city (in °C)
  7. Whether a person owns a smartphone (yes = 1, no = 0)
  1. Discrete – counts books, can only be whole numbers
  2. Continuous – time can be measured with decimals (e.g., 23.75 minutes)
  3. Discrete – number of heads is a count between 0 and 10
  4. Continuous – height is measurable and can take any value in a range
  5. Discrete – number of emails is countable
  6. Continuous – temperature is measured on a continuous scale
  7. Discrete – binary variable (yes/no)
  1. Two workers, Alice and Bob, work independently on a task. The time (in hours) it takes each of them to complete their part is modeled as discrete random variables:
    \(X\) is Alice’s completion time, with \(E(X) = 5\) and \(\operatorname{Var}(X) = 1.5\)
    \(Y\) is Bob’s completion time, with \(E(Y) = 6\) and \(\operatorname{Var}(Y) = 2\)
    Assume \(X\) and \(Y\) are independent. Now define:
    • \(T = X + Y\) as the total time they spend
    • \(D = Y - X\) as the difference between their times
    Compute \(E(T)\), \(\operatorname{Var}(T)\), \(E(D)\) and \(\operatorname{Var}(D)\).

Total time: \(T = X + Y\). Using linearity of expectation:

\[ E(T) = E(X) + E(Y) = 5 + 6 = 11 \]

Since \(X\) and \(Y\) are independent:

\[ \operatorname{Var}(T) = \operatorname{Var}(X) + \operatorname{Var}(Y) = 1.5 + 2 = 3.5 \]

Time difference: \(D = Y - X\). Again, using linearity:

\[ E(D) = E(Y - X) = E(Y) - E(X) = 6 - 5 = 1 \]

Variance of the difference (since \(X\) and \(Y\) are independent):

\[ \operatorname{Var}(D) = \operatorname{Var}(Y) + \operatorname{Var}(X) = 2 + 1.5 = 3.5 \]