15 Jointly Distributed Random Variables

Up to this point, we’ve focused primarily on single random variables, for example, the number of heads in a series of coin flips or the height of a randomly selected person. But in many real-world scenarios, we are interested in studying multiple random variables at the same time.

In this chapter we take a close look on jointly distributed random variables and explore their joint, marginal, and conditional behavior, then we introduce the concepts of independence and correlation, and learn how to work with sums and differences.

15.1 Multivariate Random Variables

Some variables may emerge from the same underlying random experiment, and they might be related or not. Our goal is to understand their individual behaviors, and more importantly, how they interact with each other.

For instance:

Suppose we randomly select one person from a population.
Let \(X\) be their weight and \(Y\) be his height. These two measurements clearly relate to the same individual and may show some correlation, taller people tend to weigh more, on average.
Or imagine rolling a die twice.
Let \(X\) be the number of dots on the first roll, and \(Y\) the number of dots on the second. Here, the two values result from separate events within the same experiment, and in this case, may be independent.

When we analyze such situations, we say that the variables \(X\) and \(Y\) are jointly distributed. That is, we consider their distributions together, not in isolation.

The concept extends naturally beyond just two variables. We may consider \(n\) random variables \(X_1, X_2, \dots, X_n\) that are all jointly distributed, meaning they are defined from a single experiment and may or may not be dependent on each other.

Consider the experiment of rolling a die 10 times. We can define a sequence of variables:

\[ X_1 = \text{dots on roll 1}, \quad X_2 = \text{dots on roll 2}, \quad \dots, \quad X_{10} = \text{dots on roll 10} \]

Together, these form 10 jointly distributed random variables. In this case, each variable reflects a distinct outcome from a shared process (a sequence of rolls) and is naturally treated as part of a multivariate random structure.

15.2 Joint Probability Distribution (Discrete Case)

Let \(X\) and \(Y\) be two discrete random variables observed together. Then we define their joint probability mass function (joint PMF) as:

\[ P(x, y) = P(X = x \text{ and } Y = y) = P(X = x \cap Y = y) \]

This function gives the probability of every possible pair of outcomes \((x, y)\). It’s essentially a table (or function) that assigns a probability to each combination of values that \(X\) and \(Y\) might jointly take.

When we refer to the joint distribution of \(X\) and \(Y\), we mean the complete collection of these probabilities, i.e. the full map of how the two variables behave together.

In the next sections, we’ll explore how to work with joint distributions, derive marginal and conditional probabilities, and investigate whether or not the variables are independent.

Example 15.1: Family Composition

Let’s consider a practical example. Suppose we randomly select a family from a large population. Let:

\(X\) = number of boys in the family
\(Y\) = number of girls in the family

Below is the joint probability distribution for \(X\) and \(Y\), represented in table form. Table 15.1 tells us how likely it is to observe each specific combination of boys and girls.

Table 15.1: Joint Probability Table for Number of Boys (\(X\)) and Girls (\(Y\)).

Joint Probability Table for Number of Boys (X) and Girls (Y)
	Number of girls (Y)
Number of boys (X)	0	1	2	3	4	P(X = x)
0	0.38	0.16	0.04	0.01	0.01	0.60
1	0.17	0.08	0.02	–	–	0.27
2	0.05	0.02	0.01	–	–	0.08
3	0.02	0.01	–	–	–	0.03
4	0.02	–	–	–	–	0.02
P(Y = y)	0.64	0.27	0.07	0.01	0.01	1.00

Table 15.1 above shows the full joint probability distribution of the number of boys (\(X\)) and girls (\(Y\)) in a randomly selected family. Each cell in the main body of the table gives the probability that a family has a specific combination of boys and girls. For example, the probability of having 1 boy and 2 girls is 0.02.

15.3 Marginal Distributions

From the joint distribution, we can compute the marginal distributions for \(X\) and \(Y\). These represent the individual behavior of each variable, ignoring the other.

The marginal probability distribution for \(X\) is obtained by summing each row across all \(y\) values:

\[ P(X = x) = \sum_y P(x, y) \]

Similarly, the marginal distribution for \(Y\) comes from summing each column across all \(x\) values:

\[ P(Y = y) = \sum_x P(x, y) \]

The row and column totals in the table reflect these marginal probabilities.

Example 15.1: Family Composition (cont’d)

At the end of each row in Table 15.1, we see the marginal distribution of \(X\), denoted \(P(X = x)\). These values are computed by summing the probabilities across each row, that is, over all values of \(Y\) for a fixed \(X\). The interpretation is simple: \(P(X = 1) = 0.27\) means that 27% of the families have exactly one boy, regardless of how many girls they have.

Similarly, the bottom row contains the marginal distribution of \(Y\), labeled \(P(Y = y)\). These probabilities are calculated by summing down each column over all values of \(X\) for a fixed number of girls \(Y = y\). For instance, \(P(Y = 0) = 0.64\) indicates that 64% of the families have no girls, irrespective of how many boys they have.

These marginal distributions summarize the overall behavior of each variable independently, while the joint distribution reflects how the variables interact.

15.4 Conditional Distributions

With the joint and marginal distributions in hand, we can now define and compute conditional probabilities. These allow us to ask more nuanced questions, such as:

“What is the probability a family has two boys given that it has no girls?”
“How does the number of boys vary given that the family has two girls?”

To compute a conditional distribution, we use the definition:

\[ P(X = x \mid Y = y) = \frac{P(X = x \cap Y = y)}{P(Y = y)} \]

This gives us the probability of a specific \(X\) value, assuming we know the value of \(Y\). We calculate this by dividing each entry in a given column by the total at the bottom of that column.

Example 15.1: Family Composition (cont’d)

Using the formula:

\[ P(X = x \mid Y = y) = \frac{P(X = x, Y = y)}{P(Y = y)} \]

We divide each cell in a column (fixed \(Y = y\)) by the column total to obtain the conditional distribution of \(X\) given \(Y = y\).

For example, to compute \(P(X = 0 \mid Y = 1)\): \[ P(X = 0 \mid Y = 1) = \frac{P(X = 0 \cap Y = 1)}{P(Y = 1)} = \frac{0.16}{0.27} \approx 0.59 \]

This tells us that among families with exactly one girl, 59% of them have no boys.

Table 15.2 is the conditional distribution of \(X\) given various values of \(Y\):

Table 15.2: Conditional Probabilities \(P(X = x | Y = y)\).

	Number of girls (Y)
Number of boys (X)	0	1	2	3	4	P(X = x)
0	0.59	0.59	0.57	1.00	1.00	0.60
1	0.27	0.30	0.29	–	–	0.27
2	0.08	0.07	0.14	–	–	0.08
3	0.03	0.04	–	–	–	0.03
4	0.02	–	–	–	–	0.02
	1.00	1.00	1.00	1.00	1.00	1.00

Note that each column now sums to 1, since we’re conditioning on a fixed \(Y = y\). In other words Each such column-normalized set of values gives the conditional distribution of \(X\) given \(Y = y\). These distributions allow us to examine how the number of boys is influenced by the known number of girls. We can also reverse the roles and analyze \(P(Y = y \mid X = x)\) by normalizing each row.

This approach allows us to analyze how knowledge of one variable informs us about the likely values of the other, a key idea that leads naturally into concepts like independence, correlation, and covariance, which we’ll explore next.

15.5 Independence of Random Variables

When working with two or more random variables defined on the same experiment, an important question is whether they are independent, that is, whether the value of one gives us any information about the other.

Two discrete random variables \(X\) and \(Y\) are said to be independent if and only if:

\[ P(X = x, Y = y) = P(X = x) \cdot P(Y = y) \]

for all combinations of values \(x\) and \(y\).

This definition mirrors what we’ve already seen for independence between events:
If \(A\) and \(B\) are events, then they are independent if:

\[ P(A \cap B) = P(A) \cdot P(B) \]

In this setting, each event \(X = x\) and \(Y = y\) can be thought of as a particular outcome, and we require that every pair of outcomes satisfies the multiplicative rule to call \(X\) and \(Y\) independent.

In applied settings, we often use prior experience or domain knowledge to assume independence between variables when it’s reasonable. For instance, if two numerical quantities are measured in such a way that one could not possibly influence the other, we usually model them as independent.

Example 15.2: Rolling Dice Twice

Let:

\(X\) = number of dots on the first die roll
\(Y\) = number of dots on the second die roll

If the dice rolls are fair and separate, then the outcome of the first roll has no effect on the second. Therefore, we assume that \(X\) and \(Y\) are independent random variables.

The joint probability of both rolls resulting in a six is:

\[ P(X = 6, Y = 6) = P(X = 6) \cdot P(Y = 6) = \frac{1}{6} \cdot \frac{1}{6} = \frac{1}{36} \]

Example 15.1: Family Composition (cont’d)

Let’s revisit the earlier example with the joint probability distribution of the number of boys (\(X\)) and girls (\(Y\)) in a family and ask: are number of boys and girls independent?

To test for independence, we compare:

\[ P(X = x, Y = y) \stackrel{?}{=} P(X = x) \cdot P(Y = y) \]

For example:

From Table 15.1 we get that \(P(X = 1, Y = 1) = 0.08\)
Marginals: \(P(X = 1) = 0.27\), \(P(Y = 1) = 0.27\)

So:

\[ P(X = 1) \cdot P(Y = 1) = 0.27 \cdot 0.27 = 0.0729 \]

Since \(0.0729 \neq 0.08\), the condition does not hold. Therefore, \(X\) and \(Y\) are not independent. This implies that the number of boys in a family is somewhat related to the number of girls, not surprising, as both are constrained by total family size.

15.6 Constructing a Joint Distribution Under Independence

Suppose \(X\) and \(Y\) are two independent random variables defined as follows:

\(X\) takes values 1 and 2 with probabilities 0.4 and 0.6
\(Y\) takes values 1, 2, and 3 with probabilities 0.2, 0.5, and 0.3

Since \(X\) and \(Y\) are independent, the joint probability is obtained by multiplying the marginal probabilities for each combination of \(x\) and \(y\). Doing so leads to Table 15.3.

Table 15.3: Joint Probability Table for Independent Random Variables X and Y

	Y
X	1	2	3	P(X = x)
1	0.08	0.20	0.12	0.40
2	0.12	0.30	0.18	0.60
P(Y = y)	0.20	0.50	0.30	1.00

Each cell is computed as:

\[ P(X = x, Y = y) = P(X = x) \cdot P(Y = y) \]

For example:

\(P(X = 1, Y = 1) = 0.4 \cdot 0.2 = 0.08\)
\(P(X = 2, Y = 2) = 0.6 \cdot 0.5 = 0.30\)

Table 15.3 reflects the idea that independent random variables “don’t talk to each other”, knowing the value of one gives no information about the other.

The definition of independence extends beyond just two variables. A set of random variables \(X_1, X_2, \dots, X_n\) is said to be mutually independent if and only if:

\[ P(X_1 = x_1, X_2 = x_2, \dots, X_n = x_n) = P(X_1 = x_1) \cdot P(X_2 = x_2) \cdots P(X_n = x_n) \]

for all combinations of values \(x_1, x_2, \dots, x_n\).

This definition guarantees that knowing the value of one variable gives us no information about the values of the others; all outcomes occur independently of one another.

15.7 Covariance and Correlation: Observed Data

Now, suppose we have a data set with two measured variables \(x\) and \(y\):

Obs.	\(x_i\)	\(y_i\)
1	\(x_1\)	\(y_1\)
2	\(x_2\)	\(y_2\)
\(\vdots\)	\(\vdots\)	\(\vdots\)
\(n\)	\(x_n\)	\(y_n\)

The sample covariance between \(x\) and \(y\) is defined as:

\[ s_{xy} = \frac{1}{n - 1} \sum_{i = 1}^{n} (x_i - \bar{x})(y_i - \bar{y}) \]

If \(s_{xy} > 0\), \(x\) and \(y\) tend to increase together (positive linear relationship).
If \(s_{xy} < 0\), one tends to increase as the other decreases (negative relationship).

A more standardized measure is the sample correlation coefficient \(r_{xy}\):

\[ r_{xy} = \frac{s_{xy}}{s_x s_y} \]

where \(s_x\) and \(s_y\) are the standard deviations of \(x\) and \(y\) respectively.

Alternatively, this can be computed as:

\[ r_{xy} = \frac{\sum_{i = 1}^n (x_i - \bar{x})(y_i - \bar{y})} {\sqrt{\sum_{i = 1}^n (x_i - \bar{x})^2} \cdot \sqrt{\sum_{i = 1}^n (y_i - \bar{y})^2}} \]

This formula provides a unitless measure of linear association, bounded by \(-1 \leq r_{xy} \leq 1\).

\(r_{xy} = 1\) means perfect positive linear correlation
\(r_{xy} = -1\) means perfect negative linear correlation
\(r_{xy} = 0\) means no linear correlation (but nonlinear relationships may still exist)

For more details see Section 7.6.

15.8 Covariance and Correlation for Random Variables

We now extend these ideas to random variables.

The theoretical covariance of two random variables \(X\) and \(Y\) is defined as:

\[ \operatorname{Cov}(X, Y) = \sigma_{XY} = E[(X - \mu_X)(Y - \mu_Y)] \]

In discrete form:

\[ \operatorname{Cov}(X, Y) = \sum_x \sum_y (x - \mu_X)(y - \mu_Y) P(x, y) \]

We also use a computational shortcut:

\[ \operatorname{Cov}(X, Y) = E(XY) - \mu_X \mu_Y \]

The correlation coefficient between two random variables \(X\) and \(Y\) is:

\[ \rho_{XY} = \frac{\operatorname{Cov}(X, Y)}{\sigma_X \sigma_Y} \]

This is a standardized version of covariance, ranging between \(-1\) and \(1\), and easier to interpret.

Just like with sample data:

\(\rho = 1\) implies perfect positive linear association
\(\rho = -1\) implies perfect negative linear association
\(\rho = 0\) means \(X\) and \(Y\) are uncorrelated

The correlation coefficient always shares the same sign as the covariance, that is, if the covariance is positive, the correlation is positive, and vice versa.

This makes correlation a useful, unitless summary of the strength and direction of the linear association between two variables, especially when comparing across different data sets or variables with different units.

15.9 Expected Value and Variance of Sums and Differences

We now wrap up this chapter with a powerful and commonly used set of results: how to compute the expected value and variance of a sum or difference of two random variables. This is realted to what we covered in Section 12.4.

Let \(X\) and \(Y\) be two random variables. Suppose we are interested in the distribution of their sum \(X + Y\) or difference \(X - Y\). The following rules apply:

The expected value behaves linearly, meaning:

\[ E(X + Y) = E(X) + E(Y) \]

and

\[ E(X - Y) = E(X) - E(Y) \]

This rule is always true, regardless of whether \(X\) and \(Y\) are independent.

Variance, however, is not generally linear. The formula involves the covariance between \(X\) and \(Y\):

\[ \begin{aligned} \operatorname{Var}(X + Y) &= \operatorname{Var}(X) + \operatorname{Var}(Y) + 2 \cdot \operatorname{Cov}(X, Y) \\ \operatorname{Var}(X - Y) &= \operatorname{Var}(X) + \operatorname{Var}(Y) - 2 \cdot \operatorname{Cov}(X, Y) \end{aligned} \]

So when variables are positively correlated, the variance of their sum increases. When they’re negatively correlated, the variance of the sum decreases.

A special case concerns uncorrelated variables.If \(X\) and \(Y\) are uncorrelated, that is:

\[ \operatorname{Cov}(X, Y) = 0 \]

Then the formulas simplify to:

\[ \begin{aligned} \operatorname{Var}(X + Y) &= \operatorname{Var}(X) + \operatorname{Var}(Y) \\ \operatorname{Var}(X - Y) &= \operatorname{Var}(X) + \operatorname{Var}(Y) \end{aligned} \]

It’s important to note:

If \(X\) and \(Y\) are independent, then they are also uncorrelated.

\[ \text{If } X \perp Y \Rightarrow \rho_{XY} = 0 \]

However, the converse is not true: uncorrelated variables may still be dependent, just not in a linear way. For example, \(Y = X^2\) is nonlinearly related to \(X\) but could have \(\rho = 0\).

That is, independence implies zero covariance:

\[ X \perp Y \quad \Rightarrow \quad \operatorname{Cov}(X, Y) = 0 \]

But the reverse is not necessarily true. Two variables can have zero covariance (i.e., be uncorrelated) and still be dependent in a nonlinear way. This is further illustrated in Exercise 2 below.

Exercises

Given the joint distribution below:

\(x\) \ \(y\)	0	1	2	\(P(x)\)
1	0.08	0.12	0.30	0.50
2	0.12	0.18	0.20	0.50
\(P(y)\)	0.20	0.30	0.50	1.00

Calculate the correlation \(\rho_{XY}\) between \(X\) and \(Y\).

Solution

First, use marginal distributions to compute all relevant expected values: \[ E(X) = 1 \cdot 0.50 + 2 \cdot 0.50 = 0.50 + 1.00 = 1.50 \] \[ E(Y) = 0 \cdot 0.20 + 1 \cdot 0.30 + 2 \cdot 0.50 = 0 + 0.30 + 1.00 = 1.30 \]

We multiply each \((x, y)\) pair by its joint probability:

\[ \begin{aligned} E(XY) &= 1 \cdot 0 \cdot 0.08 + 1 \cdot 1 \cdot 0.12 + 1 \cdot 2 \cdot 0.30 \\ &+ 2 \cdot 0 \cdot 0.12 + 2 \cdot 1 \cdot 0.18 + 2 \cdot 2 \cdot 0.20 \\ &= 0 + 0.12 + 0.60 + 0 + 0.36 + 0.80 = 1.88 \end{aligned} \]

Use the shortcut formula:

\[ \operatorname{Cov}(X, Y) = E(XY) - E(X)E(Y) = 1.88 - (1.50)(1.30) = 1.88 - 1.95 = -0.07 \] Use the marginal distributions again to compute variances: \[ \begin{aligned} E(X^2) &= 1^2 \cdot 0.50 + 2^2 \cdot 0.50 = 0.50 + 2.00 = 2.50 \\ \operatorname{Var}(X) &= E(X^2) - [E(X)]^2 = 2.50 - 1.50^2 = 2.50 - 2.25 = 0.25 \\ \sigma_X &= \sqrt{0.25} = 0.50 \end{aligned} \] \[ \begin{aligned} E(Y^2) &= 0^2 \cdot 0.20 + 1^2 \cdot 0.30 + 2^2 \cdot 0.50 = 0 + 0.30 + 2.00 = 2.30 \\ \operatorname{Var}(Y) &= E(Y^2) - [E(Y)]^2 = 2.30 - 1.30^2 = 2.30 - 1.69 = 0.61 \\ \sigma_Y &= \sqrt{0.61} \approx 0.78 \end{aligned} \]

Now we hjave evrything needed to cmopute the correlation: \[ \rho_{XY} = \frac{\operatorname{Cov}(X, Y)}{\sigma_X \sigma_Y} = \frac{-0.07}{0.50 \cdot 0.78} \approx \boxed{-0.18} \]

This indicates a weak negative linear relationship between \(X\) and \(Y\).

Let \(X\) be a discrete random variable taking the values \(-1\), \(0\), and \(1\), each with equal probability:

\(X\)	\(P(X)\)
-1	1/3
0	1/3
1	1/3

Define a new random variable \(Y = X^2\).

Are \(X\) and \(Y\) dependent or independent?
Compute \(\operatorname{Cov}(X, Y)\) and determine whether they are correlated.

Solution

Since \(Y = X^2\), the value of \(Y\) is entirely determined by \(X\). So \(X\) and \(Y\) are clearly dependent.

Nex compute \(E(X)\) and \(E(Y)\) \[ E(X) = (-1) \cdot \frac{1}{3} + 0 \cdot \frac{1}{3} + 1 \cdot \frac{1}{3} = 0 \]

\[ E(Y) = 1 \cdot \frac{1}{3} + 0 \cdot \frac{1}{3} + 1 \cdot \frac{1}{3} = \frac{2}{3} \]

To compute \(E(XY)\) we compute \(XY\) for each pair:

\(X = -1\), \(Y = 1\) → \(XY = -1\)
\(X = 0\), \(Y = 0\) → \(XY = 0\)
\(X = 1\), \(Y = 1\) → \(XY = 1\)

Then: \[ E(XY) = (-1) \cdot \frac{1}{3} + 0 + 1 \cdot \frac{1}{3} = 0 \]

Next we compute the covariance:

\[ \operatorname{Cov}(X, Y) = E(XY) - E(X)E(Y) = 0 - (0 \cdot \tfrac{2}{3}) = 0 \]

This means: \[ \operatorname{Cov}(X, Y) = 0 \quad \text{and} \quad \rho_{XY} = 0 \] We see here that even though \(X\) and \(Y\) are clearly dependent, their covariance and correlation are zero. This is an illustration of that uncorrelated does not imply independence.