If we want to compare the average effects elicited by \(k\) different levels of some given factor, there will be \(k\) independent random samples of sizes \(n_j\quad (j=1,2,...,k)\), the total sample size is \(n=\displaystyle\sum_{j=1}^{k}n_j\). Let \(Y_{ij}\) represent the \(i^{th}\) observation recorded for the \(j^{th}\) level. \[ \begin{array}{|c|cccc|} \hline &&\text{treatment}&\text{levels}&\\ \hline & 1 & 2 & \cdots & k \\ \hline & Y_{11} & Y_{12} & \cdots & Y_{1k} \\ & Y_{21} & Y_{22} & \cdots & Y_{2k} \\ &\vdots &\vdots &\cdots&\vdots \\ &Y_{n_11} &Y_{n_22} &\cdots&Y_{n_kk} \\ \text{Sample sizes:}&n_1&n_2&\cdots&n_k\\ \text{Sample totals:}&T_{. 1}&T_{. 2}&\cdots&T_{. k}\\ \text{Sample means:}&\overline Y_{. 1}&\overline Y_{. 2}&\cdots&\overline Y_{. k}\\ \text{True means:}&\mu_1&\mu_2&\cdots&\mu_k\\ \hline \end{array} \]
Where \[T_{.j}=\displaystyle\sum_{i=1}^{n_j}Y_{ij}\], \[\overline Y_{.j}=\frac{1}{n_j}\displaystyle\sum_{i=1}^{n_j}Y_{ij}=\frac{T_{.j}}{n_j}\], and the overall total is \[T_{..}=\sum_{j=1}^{k}\sum_{i=1}^{n_j}Y_{ij}=\sum_{j=1}^{k}T_{.j}\], overall mean is \[\overline Y_{..}=\frac{1}{n}T_{..}\]
Now we presume that for each \(j\), \(Y_{1j}, Y_{2j}, . . . ,Y_{njj}\) is a random sample from a normal distribution and independent with each other with mean \(\mu_j,j = 1, 2, . . . , k,\) and variance \(\sigma^2\) (constant for all \(j\)). The maximum likelihood estimator of \(μ_j\) is \(\overline Y_{.j}\) and the maximum likelihood estimator of \(μ\) is \(\overline Y_{..}\).
The sum of squares of treatment (\(SSTR\)) estimates the variation among the \(\mu_j\)’s and is defined by \[\begin{align} SSTR=\sum_{j=1}^{k}\sum_{i=1}^{n_j}(\overline Y_{.j}-\overline Y_{..})^2 &=\sum_{j=1}^{k}n_j(\overline Y_{.j}-\overline Y_{..})^2\\ &=\sum_{j=1}^{k}n_j\Bigl[(\overline Y_{.j}-\mu)-(\overline Y_{..}-\mu)\Bigr]^2\\ &=\sum_{j=1}^{k}n_j\Bigl[(\overline Y_{.j}-\mu)^2-2(\overline Y_{.j}-\mu)(\overline Y_{..}-\mu)+(\overline Y_{..}-\mu)^2\Bigr]\\ &=\sum_{j=1}^{k}n_j(\overline Y_{.j}-\mu)^2+\sum_{j=1}^{k}n_j(\overline Y_{..}-\mu)^2-2\sum_{j=1}^{k}n_j(\overline Y_{.j}-\mu)(\overline Y_{..}-\mu)\\ &=\sum_{j=1}^{k}n_j(\overline Y_{.j}-\mu)^2+n(\overline Y_{..}-\mu)^2-2(\overline Y_{..}-\mu)n(\overline Y_{..}-\mu)\\ &=\sum_{j=1}^{k}n_j(\overline Y_{.j}-\mu)^2-n(\overline Y_{..}-\mu)^2\\ &=\sum_{j=1}^{k}n_j\overline Y_{.j}^2-2\sum_{j=1}^{k}n_j\overline Y_{.j}\mu+\sum_{j=1}^{k}n_j\mu^2-n\overline Y_{..}^2+2n\overline Y_{..}\mu-n\mu^2\\ &=\sum_{j=1}^{k}n_j\overline Y_{.j}^2-n\overline Y_{..}^2\\ &=\sum_{j=1}^{k}\frac{\overline T_{.j}^2}{n_j}-\frac{T_{..}^2}{n} \end{align}\]. Then, \[\begin{align} E(SSTR)&=E\Bigl(\sum_{j=1}^{k}n_j(\overline Y_{.j}-\mu)^2-n(\overline Y_{..}-\mu)^2\Bigr)\\ &=\sum_{j=1}^{k}n_jE(\overline Y_{.j}-\mu)^2-nE(\overline Y_{..}-\mu)^2\\ &=\sum_{j=1}^{k}n_j\Bigl(Var(\overline Y_{.j}-\mu)+(E(\overline Y_{.j}-\mu))^2\Bigr)-n\frac{\sigma^2}{n}\\ &=\sum_{j=1}^{k}n_j(\frac{\sigma^2}{n_j})+\sum_{j=1}^{k}n_j(E(\overline Y_{.j}-\mu))^2-\sigma^2\\ &=(k-1)\sigma^2+\sum_{j=1}^{k}n_j(\mu_j-\mu)^2 \end{align}\].
When \(\sigma^2\) is known, the null hypothesis that the treatment level means are all equal \(H0:\mu_1 = \mu_2 = \cdots = \mu_k=\mu\) is true, \(E(SSTR)=(k-1)\sigma^2\), and \(\frac{SSTR}{\sigma^2}\) has a \(\chi^2\) distribution with \(k − 1\) degrees of freedom.
When \(\sigma^2\) is unknown, the \(j^{th}\) sample variance is: \[S_j^2=\frac{1}{n_j-1}\displaystyle\sum_{i=1}^{n_j}(Y_{ij}-\overline Y_{.j})^2\], and the sum of squares of error (SSE) is: \[SSE=\sum_{j=1}^{k}(n_j-1)S_j^2=\sum_{j=1}^{k}\sum_{i=1}^{n_j}(Y_{ij}-\overline Y_{.j})^2\].
Whether or not \(\mu_1 = \mu_2 = \cdots = \mu_k\) is true, \((n_j-1)S_j^2/\sigma^2\) has a \(\chi^2\) distribution with \(n_j − 1\) degrees of freedom, and \(SSE/\sigma^2\) has a \(\chi^2\) distribution with \(n − k\) degrees of freedom.
The Sum of squares of total (SSTOT) is the variation of the data about the parameter \(\mu\), \[\begin{align} SSTOT&=\sum_{j=1}^{k}\sum_{i=1}^{n_j}(Y_{ij}-\overline Y_{..})^2\\ &=\sum_{j=1}^{k}\sum_{i=1}^{n_j}\Bigl[(Y_{ij}-\overline Y_{.j})+(\overline Y_{.j}-\overline Y_{..})\Bigr]^2\\ &=\sum_{j=1}^{k}\sum_{i=1}^{n_j}(Y_{ij}-\overline Y_{.j})^2+\sum_{j=1}^{k}\sum_{i=1}^{n_j}(\overline Y_{.j}-\overline Y_{..})^2+2\sum_{j=1}^{k}\sum_{i=1}^{n_j}(Y_{ij}-\overline Y_{.j})(\overline Y_{.j}-\overline Y_{..})\\ &=\sum_{j=1}^{k}\sum_{i=1}^{n_j}(Y_{ij}-\overline Y_{.j})^2+\sum_{j=1}^{k}\sum_{i=1}^{n_j}(\overline Y_{.j}-\overline Y_{..})^2+2\sum_{j=1}^{k}(\overline Y_{.j}-\overline Y_{..})\sum_{i=1}^{n_j}(Y_{ij}-\overline Y_{.j})\\ &=\sum_{j=1}^{k}\sum_{i=1}^{n_j}(Y_{ij}-\overline Y_{.j})^2+\sum_{j=1}^{k}\sum_{i=1}^{n_j}(\overline Y_{.j}-\overline Y_{..})^2\\ &=SSE+SSTR \end{align}\]
Because \(SSTR/\sigma^2\) and \(SSE/\sigma^2\) are independent \(\chi^2\) square random variables with \(k − 1\) and \(n − k\) df,
if \(H0:\mu_1 = \mu_2 = \cdots = \mu_k\) is true, \[F=\frac{SSTR/(k-1)}{SSE/(n-k)}\] has a \(F\) distribution with \(k − 1\) and \(n − k\) df.
ANOVA table for testing \(H0:\mu_1 = \mu_2 = \cdots = \mu_k\):
\[ \begin{array}{lccccc} \hline Source & df & SS & MS & F & P \\ \hline Treatment & k-1 & SSTR & MSTR & \frac{MSTR}{MSE} & P(F_{k−1,n− k} \ge F(observed)) \\ Error & n-k & SSE & MSE & \\ Total & n-1 & SSTOT\\ \hline \end{array} \] \(F\)-test rejects \(H0:\mu_1 = \mu_2 = \cdots = \mu_k\) at \(\alpha\) if \(F=\frac{MSTR}{MSE}>F_{k−1,n− k}(\alpha)\)