Mathematical Expectation

If \(f(x)\) is the probability mass function (p.m.f.) of the discrete random variable \(X\) with support \(S\), and if the summation:

\[\sum_{x\in S} u(x)f(x)\]

exists (less than \(\infty\)), then the resulting sum is called the mathematical expectation, or the expected value of the function \(u(X)\), where \(x\) denotes individual sample point. The expectation of the function \(u(X)\) is denoted as \(E[u(X)]\):

\[E[u(X)] = \sum_{x\in S} u(x)f(x).\]

Population Mean and Variance

If we consider one particular function: \(u(X) = X\), then the expectation of \(u(X)\):

\[E[u(X)] = E[X] = \sum_{x\in S} xf(x),\]

is called the expected value of \(X\), denoted as \(E[X]\). Or, it is called the mean of \(X\), denoted as \(\mu\).

Now, let’s consider another function: \(u(X) = (X-\mu)^2\), the corresponding expectation:

\[E[u(X)] = E[(X-\mu)^2] = \sum_{x\in S}(x-\mu)^2f(x),\]

is called the variance of \(X\), denoted as \(\mathrm{Var}(X)\) or \(\sigma^2\).

Sample Mean and Variance

If the population is too large and we still want to decribe the population using mean and variance, then it would be useful to select a sample and calculate the sample mean and sample variance.

The sample mean is simply the average of the \(n\) data points \(x_1\), \(x_2\), …, \(x_n\):

\[\bar{x} = \frac{x_1+x_2+...+x_n}{n} = \frac{1}{n}\sum_{i=1}^n x_i.\]

The sample variance summaries the “spread” or “variation” of the data:

\[\begin{aligned} s^2 &= \frac{(x_1-\bar{x})^2+(x_2-\bar{x})^2+ ... +(x_n-\bar{x})^2}{n-1} \\ &= \frac{1}{n-1}\sum_{i=1}^n(x_i-\bar{x})^2. \end{aligned}\]

Important Properties

Note that the denominator of the sample variance is \(n-1\). Why is that? Before we start to analyse that, we first look at some important properties of expectation. The proof will be given in the next section.

Property 1:

if c is a constant, then \(E[c] = c\) and \(E[cu(X)] = cE[u(X)]\).

Property 2:

\(E[u_1(X) + u_2(X)] = E[u_1(X)] + E[u_2(X)]\).

Property 3:

if \(X_1\), \(X_2\), …, \(X_n\) are \(n\) independent random variables with means \(\mu_1\), \(\mu_2\), …, \(\mu_n\) and variances \(\sigma_1^2\), \(\sigma_2^2\), …, \(\sigma_n^2\). Then, the mean and variance of the linear combination \(Y = \sum_{i=1}^{n}a_iX_i\) (\(a_i\) are real constants) are:

\[\begin{aligned} \mu_Y &= \sum_{i=1}^n a_i\mu_i \\ \sigma_Y^2 &= \sum_{i=1}^n a_i^2 \sigma_i^2. \end{aligned}\]

Proof

Let’s do some proof in this section.

Property 1:

\[\begin{aligned} E[c] &= \sum_{x\in S}cf(x) = c\sum_{x\in S}f(x) = c \times 1 = c \\ E[cu(X)] &= \sum_{x\in S}cu(X)f(x) = c\sum_{x\in S}u(X)f(x) = cE[u(X)]. \end{aligned}\]

Property 2:

\[\begin{aligned} E[u_1(X) + u_2(X)] &= \sum_{x\in S} (u_1(X) + u_2(X))f(x) \\ &= \sum_{x\in S} u_1(X)f(x) + \sum_{x\in S} u_2(X)f(x) \\ &= E[u_1(X)] + E[u_2(X)] \end{aligned}\]

Property 3:

\[\begin{aligned} \mu_Y &= E[Y] = E[\sum_{i=1}^n a_iX_i] = \sum_{i=1}^n E[a_iX_i] \\ &= \sum_{i=1}^n a_i E[X_i] = \sum_{i=1}^n a_i\mu_i \end{aligned}\] \[\begin{aligned} \sigma_Y^2 &= E[(Y-\mu_Y)^2] \\ &= E[(\sum_{i=1}^n a_iX_i - \sum_{i=1}^n a_i\mu_i)^2] \\ &= E[(\sum_{i=1}^na_i(X_i-\mu_i))^2] \\ &= E[(\sum_{i=1}^na_i(X_i-\mu_i))\cdot (\sum_{j=1}^na_j(X_j-\mu_j))] \\ &= E[\sum_{i=1}^n\sum_{j=1}^n a_ia_j(X_i-\mu_i)(X_j-\mu_j)] \\ &= \sum_{i=1}^n\sum_{j=1}^n a_ia_j E[(X_i-\mu_i)(X_j-\mu_j)]. \end{aligned}\]

Since \(X_1\), \(X_2\), …, \(X_n\) are independent random variables, the correlation between two arbitrary variables \(X_i\) and \(X_j\) with \(i\neq j\) is zero. This leads us to:

\[\sigma_Y^2 = \sum_{i=1}^n a_i^2 E[(X_i-\mu_i)^2] = \sum_{i=1}^n a_i^2 \sigma_i^2.\]

Why is the Denominator of Sample Variance n-1?

If we take the number of data points \(n\) as the denominator of the sample variance, we will see that this sample variance is a biased estimator of the population variance:

\[\begin{aligned} s_n^2 &= E[\frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^2] \\ &= E[\frac{1}{n}\sum_{i=1}^n[(x_i-\mu)-(\bar{x}-\mu)]^2] \\ &= E[\frac{1}{n}\sum_{i=1}^n(x_i-\mu)^2-2(x_i-\mu)\underbrace{(\bar{x}-\mu)}_{const.}+\underbrace{(\bar{x}-\mu)^2}_{const.}] \\ &= E[\frac{1}{n}\sum_{i=1}^n(x_i-\mu)^2 - \frac{2(\bar{x}-\mu)}{n}\underbrace{\sum_{i=1}^n(x_i}_{n\cdot\bar{x}}-\mu) + (\bar{x}-\mu)^2] \\ &= E[(X-\mu)^2]- E[2(\bar{x}-\mu)(\bar{x}-\mu)] + E[(\bar{x}-\mu)^2]]\\ &= E[(X-\mu)^2] -E[(\bar{x}-\mu)^2] \\ &= \sigma^2 - \mathrm{Var}(\bar{x}) \\ \end{aligned}\]

The bias is therefore \(\mathrm{Var}(\bar{x})\), i.e. variance of the sample mean. According to the Property 3, this term can be calculated by:

\[\begin{aligned} \mathrm{Var}(\bar{x}) &= \mathrm{Var}(\frac{1}{n}\sum_{i=1}^nx_i) = \mathrm{Var}(\sum_{i=1}^n \frac{1}{n}x_i) \\ &= \sum_{i=1}^n \frac{1}{n^2}\mathrm{Var}(x_i) = \frac{1}{n^2}\sum_{i=1}^n\mathrm{Var}(x_i). \end{aligned}\]

Since \(x_1\), \(x_2\), …, \(x_n\) are a random sample from a distribution with variance \(\sigma^2\), it follows that for each \(i = 1, 2, …, n\): \(\mathrm{Var}(x_i)=\sigma^2\). Thus, the variance of the sample mean can be further transformed to:

\[\mathrm{Var}(\bar{x}) = \frac{1}{n}\sigma^2.\]

Now, we know that the uncorrected sample variance is \(s_n^2 = \frac{n-1}{n}\sigma^2\). Thus, the corrected one would be:

\[\begin{aligned} s^2 &= \frac{n}{n-1}s_n^2 = \frac{n}{n-1}\cdot \frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^2 \\ &= \frac{1}{n-1}\sum_{i=1}^n(x_i-\bar{x})^2. \end{aligned}\]

Reference