Matrix algebra is extremely useful, but it can be confusing at first because it differs in some key ways from normal algebra. Say we have two matrices
\[\textbf{A} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix},\]
and
\[\textbf{B} = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}.\]
If we want to just multiply the two matrices (\(\textbf{A}\circ\textbf{B}\)), for example, using the familiar rules of scalar multiplication (i.e., get the Hadamard product), then we would do this for two matrices,
\[\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \circ \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} = \begin{bmatrix} 5 & 12 \\ 21 & 32 \end{bmatrix}.\]
Matrix multiplication works different. To get \(\textbf{A}\) times \(\textbf{B}\), we need to take the first element of the row of \(\textbf{A}\) and multiply it by the first element of the column of \(\textbf{B}\), then add this to the second element of the row of \(\textbf{A}\) and multiply it by the second element of the column of \(\textbf{B}\). See below for a clearer explanation of what this looks like in practice,
\[\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} = \begin{bmatrix} [(1 \times 5) + (2 \times 7)] & [(1 \times 6) + (2 \times 8)] \\ [(3 \times 5) + (4 \times 7)] & [(3 \times 6) + (4 \times 8)] \end{bmatrix} = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix}.\]
This might seem arbitrary or strange at first, but there are a lot of good reasons that matrix multiplication works this way. In statistics, one really cool property is that a matrix times its transpose (rows swapped for columns) is the sum of squares,
\[\begin{bmatrix} 1 & 2 & 3 & 4 \end{bmatrix} \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \end{bmatrix} = 1^2 + 2^2 + 3^2 + 4^2 = 30.\]
The reason for introducing this is just because we will need the matrix algebra for understanding the mixed model. I will keep this as simple as possible to give an idea of what is going on. Note that matrix addition works as you would expect,
\[\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} + \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} = \begin{bmatrix} 6 & 8 \\ 10 & 12 \end{bmatrix}.\]
To start off, let us consider the familiar linear model,
\[y_i = \beta_{0} + \beta_{1}x_i + \epsilon_{i}.\]
Note that in the above, we are predicting the value of our data point \(i\) (\(y_{i}\)) from \(i\)’s value of \(x\) (\(x_{i}\)) with the intercept (\(\beta_{0}\)), slope (\(\beta_{1}\)), and error of data point \(i\) (\(\epsilon_{i}\)). Note that \(\epsilon_{i}\) is just the residual value for \(i\); that is, its distance from the regression line (positive values mean that the data point is above the line, negative values mean that it is below).
We need to start out with a small hypothetical data set. We are not looking for something realistic to test a hypothesis; we only need something that will show what is going on mathematically, so we can consider the data below.
Y | X |
---|---|
9.44 | 6.12 |
11.24 | 6.27 |
10.64 | 4.58 |
10.36 | 4.70 |
7.69 | 4.70 |
9.93 | 3.88 |
If we do a simple linear regression model, then we find an intercept of \(\beta_{0} =\) 8.335781 and a slope of \(\beta_{1} =\) 0.3069525. Our six module residuals are -0.7743305, 0.9796267, 0.8983764, 0.5815421, -2.0884579, 0.4032432. We can put all of these into a table.
y_i | beta_0 | beta_1 | x_i | epsilon_i |
---|---|---|---|---|
9.44 | 8.335781 | 0.3069525 | 6.12 | -0.7743305 |
11.24 | 8.335781 | 0.3069525 | 6.27 | 0.9796267 |
10.64 | 8.335781 | 0.3069525 | 4.58 | 0.8983764 |
10.36 | 8.335781 | 0.3069525 | 4.70 | 0.5815421 |
7.69 | 8.335781 | 0.3069525 | 4.70 | -2.0884579 |
9.93 | 8.335781 | 0.3069525 | 3.88 | 0.4032432 |
I have written it the way I did above so that you can see how each of the values correspond to our general linear model above,
\[y_i = \beta_{0} + \beta_{1}x_i + \epsilon_{i}.\]
To get a the \(y\) value for \(i = 1\) (\(y_{1}\)), we then just need to use the values in the first row,
\[y_i = 8.335781 + 0.3069525 (6.12) + -0.7743305.\]
You can confirm that the maths works here. Another way to write that general linear model is using matrix notation,
\[\textbf{Y} = \textbf{X}\beta + \textbf{e}.\]
For the above example, this would look as follows,
\[\begin{bmatrix} 9.44 \\ 11.24 \\ 10.64 \\ 10.36 \\ 7.69 \\ 9.93 \end{bmatrix} = \begin{bmatrix} 1 & 6.12 \\ 1 & 6.27 \\ 1 & 4.58 \\ 1 & 4.7 \\ 1 & 4.7 \\ 1 & 3.88 \end{bmatrix}\begin{bmatrix} 8.335781 \\ 0.3069525 \end{bmatrix}+\begin{bmatrix} -0.7743305 \\ 0.9796267 \\ 0.8983764 \\ 0.5815421 \\ -2.0884579 \\ 0.4032432 \end{bmatrix}.\]
We can do the multiplication of \(\textbf{X}\beta\) first,
\[\begin{bmatrix} 9.44 \\ 11.24 \\ 10.64 \\ 10.36 \\ 7.69 \\ 9.93 \end{bmatrix} = \begin{bmatrix} 10.2143305 \\ 10.2603733 \\ 9.7416236 \\ 9.7784579 \\ 9.7784579 \\ 9.5267568 \end{bmatrix} + \begin{bmatrix} -0.7743305 \\ 0.9796267 \\ 0.8983764 \\ 0.5815421 \\ -2.0884579 \\ 0.4032432 \end{bmatrix}.\]
Now we just need to add the two matrices on the right hand side together,
\[\begin{bmatrix} 9.44 \\ 11.24 \\ 10.64 \\ 10.36 \\ 7.69 \\ 9.93 \end{bmatrix} = \begin{bmatrix} 9.44 \\ 11.24 \\ 10.64 \\ 10.36 \\ 7.69 \\ 9.93 \end{bmatrix}.\]
Remember that we can have any number of \(\beta\) coefficients that we want (assuming that we are not overparameterising the model). Here I have used an example with only one continuous \(x\) variable (6.12, 6.27, 4.58, 4.7, 4.7, 3.88). We could just as well add a categorical variable \(x_{2}\) that might distinguish two groups (e.g., ‘species_1’ versus ‘species_2’), making \(x_{2}\) a binary column that would be added to \(\textbf{X}\) with its own coefficient \(\beta_{2}\),
\[\textbf{X} = \begin{bmatrix} 1 & 6.12 & 0 \\ 1 & 6.27 & 0 \\ 1 & 4.58 & 0 \\ 1 & 4.7 & 1 \\ 1 & 4.7 & 1 \\ 1 & 3.88 & 1 \end{bmatrix}\]
Its coefficiencts would then, be \(\beta = \left[\beta_0, \beta_1, \beta_2\right]\). The details are not so important, but I just want to emphasise that \(\textbf{X}\) (and \(\textbf{Y}\)) can have any number of columns. Remember that typically all we have are the \(\textbf{X}\) and \(\textbf{Y}\) to start with, and we need to solve for the \(\beta\) values,
\[\begin{bmatrix} 9.44 \\ 11.24 \\ 10.64 \\ 10.36 \\ 7.69 \\ 9.93 \end{bmatrix} = \begin{bmatrix} 1 & 6.12 \\ 1 & 6.27 \\ 1 & 4.58 \\ 1 & 4.7 \\ 1 & 4.7 \\ 1 & 3.88 \end{bmatrix}\begin{bmatrix} \beta_{0} \\ \beta_{1} \end{bmatrix}+\begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \\ \epsilon_4 \\ \epsilon_5 \\ \epsilon_6 \end{bmatrix}.\]
Note that \(\textbf{X}\) is sometimes called the ‘design matrix’.
The model discussed above all focused on fixed effects. Now we will add random effects to the above to build a mixed model. I am going to deliberately avoid defining ‘fixed effect’ and ‘random effect’ and instead just focus on what changes mathematically. We are going to add a new term \(\textbf{Z}\textbf{u}\) the above model (note that a lot of papers use \(\textbf{b}\) instead of \(\textbf{u}\), but I want to avoid potential confusion with \(\beta\)). Our mixed model equation just looks like the below,
\[\textbf{Y} = \textbf{X}\beta + \textbf{Z}\textbf{u} + \textbf{e}.\]
What we are doing is adding the product of group indicators (\(\textbf{Z}\)) and their random effects
(\(\textbf{u}\), their deviation \(u_{i}\) from the mean 0). We can read in
the nlme
R library and the data set ‘dung’, which looks
like the below.
library(nlme);
dung <- read.csv("dung.csv");
print(dung);
## pr_dung flies oven_block
## 1 0.1433747 223 1
## 2 0.1551917 0 1
## 3 0.1413354 100 1
## 4 0.1380987 97 1
## 5 0.1488016 130 1
## 6 0.1443227 169 1
## 7 0.1428871 183 1
## 8 0.1705165 77 2
## 9 0.1593878 0 2
## 10 0.1651354 51 2
## 11 0.1747078 49 2
## 12 0.1904174 81 2
## 13 0.1806727 45 3
## 14 0.1700109 98 3
## 15 0.1654107 168 3
## 16 0.1803931 22 3
## 17 0.1727390 4 3
## 18 0.1701061 98 3
This is a subset of data from a set of experiments that I ran in
which containers (rows) of wet cow dung were placed out for a parent
generation of dungflies to lay their eggs. The mass of the wet dung was
recorded before the experiment for each container. Offspring that
emerged from each dung container were counted (flies
), and
then the wet dung was placed in the oven to get the dry mass. The
proportion of mass still remaining after drying was calculated
(pr_dung
), but due to experimental constraints (timing of
experiments and the size of the oven), dung from containers needed to be
placed in the oven in blocks (oven_block
). Hence, the block
in which the container went into the oven could affect
pr_dung
, so we want to include this as a random effect in
the model. To summarise, the three columns of data above are as
follows:
We can place the known values from the data set above into our mixed model equation,
\[\begin{bmatrix} 0.1433747 \\ 0.1551917 \\ 0.1413354 \\ 0.1380987 \\ 0.1488016 \\ 0.1443227 \\ 0.1428871 \\ 0.1705165 \\ 0.1593878 \\ 0.1651354 \\ 0.1747078 \\ 0.1904174 \\ 0.1806727 \\ 0.1700109 \\ 0.1654107 \\ 0.1803931 \\ 0.172739 \\ 0.1701061 \\ \end{bmatrix} = \begin{bmatrix} 1 & 223 \\ 1 & 0 \\ 1 & 100 \\ 1 & 97 \\ 1 & 130 \\ 1 & 169 \\ 1 & 183 \\ 1 & 77 \\ 1 & 0 \\ 1 & 51 \\ 1 & 49 \\ 1 & 81 \\ 1 & 45 \\ 1 & 98 \\ 1 & 168 \\ 1 & 22 \\ 1 & 4 \\ 1 & 98 \\ \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \end{bmatrix}+ \begin{bmatrix} 1 & 0 & 0 \\ 1 & 0 & 0 \\ 1 & 0 & 0 \\ 1 & 0 & 0 \\ 1 & 0 & 0 \\ 1 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} u_{1} \\ u_{2} \\ u_{3} \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \\ \epsilon_4 \\ \epsilon_5 \\ \epsilon_6 \\ \epsilon_7 \\ \epsilon_8 \\ \epsilon_9 \\ \epsilon_{10} \\ \epsilon_{11} \\ \epsilon_{12} \\ \epsilon_{13} \\ \epsilon_{14} \\ \epsilon_{15} \\ \epsilon_{16} \\ \epsilon_{17} \\ \epsilon_{18} \end{bmatrix}.\]
Here is how we would run a mixed effects model in R.
mod <- lme(pr_dung ~ flies, random = ~ 1|oven_block, data = dung);
I do not want to focus on the R notation here, or the output, but we
can grab the values of \(\beta\) and
\(\textbf{u}\), and \(\textbf{e}\) to show how they relate to our
equation. We can pull these out of mod
and put them in
matrix form as follows.
beta <- as.matrix(fixed.effects(mod));
uu <- as.matrix(random.effects(mod));
ee <- as.matrix(residuals(mod));
Here are our beta
(\(\beta\)) values.
print(beta);
## [,1]
## (Intercept) 1.662317e-01
## flies -3.466743e-05
Here are our uu
(\(\textbf{u}\)) values.
print(uu);
## (Intercept)
## 1 -0.016207919
## 2 0.007158668
## 3 0.009049251
Note that the \(\textbf{u}\) values
sum to zero. Each \(u_{i}\) is a
deviation from the mean of \(u\), where
\(u\) is normally distributed around
zero. Here are our ee
(\(\epsilon\)) values (residuals).
print(ee);
## [,1]
## 1 0.0010817597
## 1 0.0051679006
## 1 -0.0052216740
## 1 -0.0085623576
## 1 0.0032845742
## 1 0.0001577298
## 1 -0.0007925669
## 2 -0.0002044497
## 2 -0.0140025103
## 2 -0.0064868708
## 2 0.0030161277
## 2 0.0198351347
## 3 0.0069518305
## 3 -0.0018725961
## 3 -0.0040461441
## 3 0.0058748860
## 3 -0.0024033180
## 3 -0.0017774556
Now we can put these values back into the equation above and confirm that the right side of the equation equals the left side,
\[\begin{bmatrix} 0.1433747 \\ 0.1551917 \\ 0.1413354 \\ 0.1380987 \\ 0.1488016 \\ 0.1443227 \\ 0.1428871 \\ 0.1705165 \\ 0.1593878 \\ 0.1651354 \\ 0.1747078 \\ 0.1904174 \\ 0.1806727 \\ 0.1700109 \\ 0.1654107 \\ 0.1803931 \\ 0.172739 \\ 0.1701061 \\ \end{bmatrix} = \begin{bmatrix} 1 & 223 \\ 1 & 0 \\ 1 & 100 \\ 1 & 97 \\ 1 & 130 \\ 1 & 169 \\ 1 & 183 \\ 1 & 77 \\ 1 & 0 \\ 1 & 51 \\ 1 & 49 \\ 1 & 81 \\ 1 & 45 \\ 1 & 98 \\ 1 & 168 \\ 1 & 22 \\ 1 & 4 \\ 1 & 98 \\ \end{bmatrix} \begin{bmatrix} 0.1662317 \\ -0.00003466743 \end{bmatrix}+ \begin{bmatrix} 1 & 0 & 0 \\ 1 & 0 & 0 \\ 1 & 0 & 0 \\ 1 & 0 & 0 \\ 1 & 0 & 0 \\ 1 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} -0.016207919 \\ 0.007158668 \\ 0.009049251 \end{bmatrix} + \begin{bmatrix} 0.0010818 \\ 0.0051679 \\ -0.0052217 \\ -0.0085624 \\ 0.0032846 \\ 1.5772976\times 10^{-4} \\ -7.9256689\times 10^{-4} \\ -2.0444972\times 10^{-4} \\ -0.0140025 \\ -0.0064869 \\ 0.0030161 \\ 0.0198351 \\ 0.0069518 \\ -0.0018726 \\ -0.0040461 \\ 0.0058749 \\ -0.0024033 \\ -0.0017775 \end{bmatrix}.\]
We can do some of the calculations on the right hand side to simplify,
\[\begin{bmatrix} 0.1433747 \\ 0.1551917 \\ 0.1413354 \\ 0.1380987 \\ 0.1488016 \\ 0.1443227 \\ 0.1428871 \\ 0.1705165 \\ 0.1593878 \\ 0.1651354 \\ 0.1747078 \\ 0.1904174 \\ 0.1806727 \\ 0.1700109 \\ 0.1654107 \\ 0.1803931 \\ 0.172739 \\ 0.1701061 \\ \end{bmatrix} = \begin{bmatrix} 0.1585009 \\ 0.1662317 \\ 0.1627649 \\ 0.162869 \\ 0.1617249 \\ 0.1603729 \\ 0.1598876 \\ 0.1635623 \\ 0.1662317 \\ 0.1644637 \\ 0.164533 \\ 0.1634236 \\ 0.1646717 \\ 0.1628343 \\ 0.1604076 \\ 0.165469 \\ 0.166093 \\ 0.1628343 \end{bmatrix} + \begin{bmatrix} -0.0162079 \\ -0.0162079 \\ -0.0162079 \\ -0.0162079 \\ -0.0162079 \\ -0.0162079 \\ -0.0162079 \\ 0.0071587 \\ 0.0071587 \\ 0.0071587 \\ 0.0071587 \\ 0.0071587 \\ 0.0090493 \\ 0.0090493 \\ 0.0090493 \\ 0.0090493 \\ 0.0090493 \\ 0.0090493 \end{bmatrix} + \begin{bmatrix} 0.0010818 \\ 0.0051679 \\ -0.0052217 \\ -0.0085624 \\ 0.0032846 \\ 1.5772976\times 10^{-4} \\ -7.9256689\times 10^{-4} \\ -2.0444972\times 10^{-4} \\ -0.0140025 \\ -0.0064869 \\ 0.0030161 \\ 0.0198351 \\ 0.0069518 \\ -0.0018726 \\ -0.0040461 \\ 0.0058749 \\ -0.0024033 \\ -0.0017775 \end{bmatrix}.\]
The way that we could do this in R is as follows, first setting up \(\textbf{X}\) and \(\textbf{Z}\).
X <- cbind(rep(x = 1, times = dim(dung)[1]), dung$flies);
Z <- matrix(data = 0, nrow = dim(dung)[1], ncol = dim(uu)[1]);
for(i in 1:dim(Z)[1]){
Z[i, dung$oven_block[i]] <- 1;
}
The for loop is just a concise way of getting Z in the form we need.
Now X
looks like the below.
## [,1] [,2]
## [1,] 1 223
## [2,] 1 0
## [3,] 1 100
## [4,] 1 97
## [5,] 1 130
## [6,] 1 169
## [7,] 1 183
## [8,] 1 77
## [9,] 1 0
## [10,] 1 51
## [11,] 1 49
## [12,] 1 81
## [13,] 1 45
## [14,] 1 98
## [15,] 1 168
## [16,] 1 22
## [17,] 1 4
## [18,] 1 98
This is what Z
looks like.
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 1 0 0
## [3,] 1 0 0
## [4,] 1 0 0
## [5,] 1 0 0
## [6,] 1 0 0
## [7,] 1 0 0
## [8,] 0 1 0
## [9,] 0 1 0
## [10,] 0 1 0
## [11,] 0 1 0
## [12,] 0 1 0
## [13,] 0 0 1
## [14,] 0 0 1
## [15,] 0 0 1
## [16,] 0 0 1
## [17,] 0 0 1
## [18,] 0 0 1
We can do the matrix multiplication to get back pr_dung
using the following.
Y <- X %*% beta + Z %*% uu + ee;
print(cbind(Y, dung$pr_dung));
## (Intercept)
## [1,] 0.1433747 0.1433747
## [2,] 0.1551917 0.1551917
## [3,] 0.1413354 0.1413354
## [4,] 0.1380987 0.1380987
## [5,] 0.1488016 0.1488016
## [6,] 0.1443227 0.1443227
## [7,] 0.1428871 0.1428871
## [8,] 0.1705165 0.1705165
## [9,] 0.1593878 0.1593878
## [10,] 0.1651354 0.1651354
## [11,] 0.1747078 0.1747078
## [12,] 0.1904174 0.1904174
## [13,] 0.1806727 0.1806727
## [14,] 0.1700109 0.1700109
## [15,] 0.1654107 0.1654107
## [16,] 0.1803931 0.1803931
## [17,] 0.1727390 0.1727390
## [18,] 0.1701061 0.1701061
Models can of course get a lot more complex than what we just did, but this should give you a general idea of what is going on in R when you run a mixed model.