## Packages
library(dplyr)
library(mmrm)
# Data
mmrm::bcva_data %>%
dplyr::rename_with(toupper) %>%
dplyr::slice_head(n = 15)6 Linear Mixed Model
$$ % Basic sets/Variables
% Probability and statistics % % Linear algebra % Math functions % Distributions% Update symbols $$
Expanding upon the concepts introduced in Chapter 5 and Section 3, the following will allow the linear regression to take into consideration if there are some individual variability in the data (see Example 6.1). While linear regression assumes that all observations are independent, this assumption is often violated in practice when multiple observations come from the same subject, family, or group. Linear mixed models address this limitation by incorporating both fixed effects (see Section 2) and random effects. Thus, similar to how the fixed effects are incorporated in the linear regression, the random effects can be included as \(Z \cdot \beta\), where \(Z\) describes the design matrix for the random effects and \(\beta\) the random effect regression coefficients. Hereby, the linear mixed model is defined as: \[ y = X \cdot \alpha + Z \cdot \beta + \varepsilon. \]
Unlike linear regression where \(\varepsilon\) is the only source of variability, the linear mixed model decomposes the total variability into two components: between-individual variability (captured by \(\beta\)) and within-individual variability (captured by \(\varepsilon\)). This decomposition is particularly important when observations within the same individual are correlated. The random effects are typically assumed to follow a Gaussian distribution (see Section A.9) with mean zero and covariance matrix \(D\), while the errors are assumed to follow a Gaussian distribution with mean zero and covariance matrix \(R\). The choice of covariance structure for both \(D\) and \(R\) can have substantial impact on the model fit and inference (see Chapter 3).
Differences from Linear Regression
The primary difference between linear mixed models and linear regression lies in the handling of correlation structures. Linear regression assumes that all observations are independent, with constant variance \(\sigma^2\) for the errors. This assumption is encoded in the covariance matrix of \(y\), which is \(\sigma^2 I\) where \(I\) is the identity matrix. In contrast, the linear mixed model allows for a more complex covariance structure \(V = ZDZ^\top + R\), where \(ZDZ^\top\) captures the between-individual variability and \(R\) captures the within-individual variability. This flexibility enables the model to account for correlation between observations from the same individual while maintaining independence between individuals.
Another key difference is in the interpretation of the regression coefficients. In linear regression, the fixed effects \(\alpha\) represent the change in the expected response for a one-unit increase in the covariate, averaging over all observations. In linear mixed models, the fixed effects \(\alpha\) have the same interpretation at the population level, but the model additionally estimates random effects \(\beta\) that quantify how much each individual deviates from this population average. This allows for individual-specific predictions that account for each subject’s unique characteristics.
The estimation procedure also differs between the two approaches. Linear regression typically uses ordinary least squares, which minimizes the sum of squared residuals. Linear mixed models require more complex estimation methods such as maximum likelihood (ML) or restricted maximum likelihood (REML), which account for both the fixed- and random effects simultaneously. REML is often preferred over ML because it provides less biased estimates of covariance components (see Section 6.1.2), particularly when the number of fixed effects is large relative to the sample size.
Linear mixed models should be considered when the data includes repeated measurements, clustered observations, or hierarchical relationships. Examples include clinical studies where individuals are measured at multiple time points, multi-center clinical trials where individuals are nested within sites, or educational studies where students are nested within schools. In these cases, observations within the same group are likely to be more similar to each other than to observations from different groups, violating the independence assumption of linear regression. The inclusion of random effects is justified when there is evidence of individual-level variability that is not fully explained by the fixed effects. If all individuals respond identically to the covariates, then linear regression may be sufficient. However, if individuals exhibit heterogeneity in their baseline values, response rates, or other characteristics, then random effects are necessary to capture this variability. Failing to account for this correlation can lead to underestimated standard errors, anti-conservative confidence intervals, and inflated Type I error rates.
Shortcomings
Despite their flexibility, linear mixed models have limitations. Firstly, like linear regression, a linear relationship between the covariates and the response is assumed, which may not hold for all data. When this assumption is violated, non-linear mixed models (see Chapter 7) should be considered. Secondly, linear mixed models require the specification of both fixed- and random effects structures as well as covariance structures (see Chapter 3), which introduces additional complexity and risk of misspecification - an incorrectly specified random effects structure can lead to biased estimates.
Moreover, estimation of linear mixed models is computationally more demanding than linear regression, particularly for large datasets with complex random effects structures. The maximum likelihood estimation requires iterative optimization algorithms that may fail to converge or converge to local maxima rather than global maxima. Additionally, linear mixed models assume that random effects and errors are Gaussian distributed. While the model is relatively robust to moderate departures from this assumption, severe violations can affect the validity of inference, particularly for small sample sizes.
Furthermore, the interpretation of fixed effects in linear mixed models can be more nuanced than in linear regression, as they represent population-averaged effects conditional on the random effects. This distinction becomes important when making predictions or interpreting the magnitude of covariate effects.
Example 6.1
## Packages
library(tidyverse)
library(lme4)
library(mmrm)
library(broom.mixed)
## Data
data <- mmrm::bcva_data %>%
dplyr::rename_with(toupper)
## Model
# * Random intercept for each subject (1 | USUBJID)
lmm_fit <- lme4::lmer(BCVA_CHG ~ VISITN + BCVA_BL + (1 | USUBJID), data = data)
## Coefficients
broom.mixed::tidy(lmm_fit, conf.int = TRUE, conf.level = 0.95)
## Plot
data %>%
ggplot2::ggplot(aes(x = VISITN, y = BCVA_CHG)) +
ggplot2::geom_point() +
ggplot2::labs(x = "Visit Number", y = "Change in BCVA") +
ggplot2::geom_line(
data = data %>%
dplyr::mutate(predicted = predict(lmm_fit)),
aes(x = VISITN, y = predicted),
color = "blue"
) +
ggplot2::theme_minimal()PROC MIXED data = data method = REML;
class USUBJID;
model BCVA_CHG = VISITN BCVA_BL / solution;
random intercept / subject = USUBJID;
run;
6.1 Estimation
The estimation process for linear mixed models differ from linear regression, as the inclusion of random effects introduces additional parameters that need to be estimated. The estimation typically involves three main steps: estimation of fixed effects, estimation of covariance parameters, and prediction of random effects. Each step requires specific methods and considerations to ensure accurate and reliable results.
To ease the estimation process, it will be assumed that the random effects and errors follow a Gaussian distribution (see Section A.9), that is \[ \beta \thicksim\operatorname{N}\left(0, D\right) \quad \text{and} \quad \varepsilon \thicksim\operatorname{N}\left(0, R\right). \] Under this assumption, the marginal distribution of the response variable \(y\) is also Gaussian with mean \(X \cdot \alpha\) and covariance matrix \(V = ZDZ^\top + R\). This would mean that the likelihood function can be expressed as \[ \begin{aligned} \mathcal{L}(\alpha, D, R \mid y) = \frac{1}{(2\pi)^{n/2} \sqrt{\operatorname{det}\left(V\right)}} \cdot \operatorname{exp}\left(-\frac{1}{2} (y - X\alpha)^\top V^{-1} (y - X\alpha)\right) \end{aligned} \tag{6.1}\] with the log-likelihood function given by \[ \ell(\alpha, D, R \mid y) = -\frac{n}{2} \operatorname{log}\left(2\pi\right) - \frac{1}{2} \left(\operatorname{log}\left(\operatorname{det}\left(V\right)\right) + (y - X\alpha)^\top V^{-1} (y - X\alpha)\right) \tag{6.2}\]
6.1.1 Estimation of Fixed Effects
\[ \hat{\alpha} = \left(X^\top V^{-1} X\right)^{-1} \left(X^\top V^{-1} y\right) \]
6.1.2 Estimation of Covariance Parameters
6.1.2.1 Maximum Likelihood Estimation
To estimate the covariance parameters, the maximum likelihood estimation (MLE) approach can be used. This involves maximizing the profile log-likelihood function \[ \ell_{\text{profile}}\left(D, R \mid y\right) \approx -\frac{1}{2} \operatorname{log}\left(\operatorname{det}\left(V\right)\right) - \frac{1}{2} (y - X\hat{\alpha})^\top V^{-1} (y - X\hat{\alpha}) \] where \(\hat{\alpha}\) is the estimate of the fixed effects obtained from the previous step. However, it is important to note that MLE can lead to biased estimates of the covariance parameters, particularly when the number of fixed effects \(p\) is large relative to the sample size \(n\). This bias arises because MLE does not account for the loss of degrees of freedom due to the estimation of fixed effects.
\[ \begin{align*} \mathbb{E}\left[\sigma_{\text{ML}}^2\right] &= \frac{1}{n} \mathbb{E}\left[(y - X\hat{\alpha})^\top (y - X\hat{\alpha})\right] \\ &= \sigma^2 \cdot \left(1 - \frac{n - p}{n}\right) \\ \end{align*} \]
6.1.2.2 Restricted Maximum Likelihood Estimation
As opposed to MLE, the restricted maximum likelihood estimation (REML) approach accounts for the loss of degrees of freedom due to the estimation of fixed effects. REML achieves this by maximizing a modified likelihood function that is based on the residuals from the fixed effects model. Maximizing the REML likelihood function leads to unbiased estimates of the covariance parameters, as it effectively adjusts for the number of fixed effects \(p\) being estimated. The REML estimate for \(\sigma^2\) can be expressed as \[ \begin{align*} \mathbb{E}\left[\sigma_{\text{REML}}^2\right] &= \frac{1}{n - p} \mathbb{E}\left[(y - X\hat{\alpha})^\top (y - X\hat{\alpha})\right] \\ &= \sigma^2 \end{align*} \]