Analyzing rating scale data with cumulative link models using the ordinal package in R

Hispanic Linguistics Symposium 2022

Scott James Perry & Guilherme D. Garcia

University of Alberta, Université Laval

11/3/22

Download workshop script + data

Go to http://scottjamesperry.com/resources/
Download HLS 2022 ordinal workshop materials
Unzip folder to preferred location
Click on script ordinal_workshop_script_HLS.R to open RStudio

Other decisions will get made first

Type of judgement:

Binary
Slider
Ordered categories

What we hope you get out of this

You aren’t familiar with R and regression framework:

Be able to explain why we shouldn’t analyze ordinal data with linear regression/ANOVA
A conceptual understanding of how ordinal regression works
Direction towards materials that we have found useful

What we hope you get out of this

You aren’t familiar with R and regression framework:

Be able to explain why we shouldn’t analyze ordinal data with linear regression/ANOVA
A conceptual understanding of how ordinal regression works
Direction towards materials that we have found useful

If you are familiar with R and regression framework:

Code to fit and visualize ordinal models
Code to fit Bayesian ordinal models

Timeline

Why commonly used methods aren’t appropriate
Conceptual introduction to ordinal models
Fit ordinal model with clm()
Interpretation/visualization
Mixed-effects ordinal models & addressing common problems
Advantages of Bayesian ordinal models (time permitting)

Questions?

Submit your question(s) by visiting:
- https://forms.office.com/r/VfFvH0mbF5
Or scan the QR code below:

What is the difference between numeric and ordinal data?

Numerical data:

Can be explicitly measured
Differences are constant (1-2 same difference as 50-51)
Allows us to perform arithmetic operations (e.g., addition, subtraction, averages)
E.g., Age, Height, Time

What is the difference between numeric and ordinal data?

Numerical data:

Can be explicitly measured
Differences are constant (1-2 same difference as 50-51)
Allows us to perform arithmetic operations (e.g., addition, subtraction, averages)
E.g., Age, Height, Time

Ordinal data:

Values not measured
Differences may not be constant (‘B.A.’-‘M.A.’ \(\neq\) ‘M.A.’-‘PhD’)
We cannot perform arithmetic operations (e.g., addition, subtraction, averages)
E.g., Education level, rating scales

Some ordinal data intuitively not numbers

These categories have an order
‘Distance’ between categories likely not equal

Rating scales less-intuitively not numbers

Numbers presented have clear order
‘Psychological distance’ for participants cannot be claimed to be equal

Common example of distribution of rating scale responses

Example data from two-group AJT task

Mismatch in assumptions between rating scales and linear models

Expectation of normal distribution not most probable value
Values outside of rating scale have to be possible

Why we can’t treat rating scales as numbers

Mental distance not always equal
Normal distribution poorly summarizes ordinal data
Bounded nature of ordinal scales a problem

Strategies that do not solve this problem:

z-scoring by participant
Averaging over items/participants

So what should we do?

Imagine acceptability as a continuous and gradient mental phenomenon

We can estimate where the amount of ‘acceptability’ changes from a 1 to a 2

Now the entire blue area is probability of a ‘1’ response

This corresponds to the probability we saw in the raw data

We model one less threshold than the number of rating categories

Our thresholds stay still, but our latent variable moves

Group A has higher than average latent variable, higher probability of higher categories

Group B has lower than average latent variable, probability of responding ‘1’ is now over 50%.

Outline of hypothetical experiment

Interpretation of relative clauses in English L2

(Near-)categorical interpretation in (1); ambiguity in (2)

Mary saw [the cat]\(_{a}\) of [the nurse]\(_b\) who likes to dance
Mary saw [the daughter]\(_{a}\) of [the nurse]\(_b\) who likes to dance

Different languages have been shown to have different biases:
- English \(\rightarrow\) low attachment (b)
- Spanish \(\rightarrow\) high attachment (a)

Can we modulate this bias by introducing a prosodic break?

Three conditions:
1. Mary saw the daughter of the nurse who likes to dance (NoBreak)
2. Mary saw the daughter # of the nurse who likes to dance (High)
3. Mary saw the daughter of the nurse # who likes to dance (Low)
Two groups (L1): Spanish and English (controls)
Response variable 1: Who likes to dance? (primary)
Response variable 2: How certain are you? (secondary)

Let’s examine our data

Before modelling our data, let’s visualize the patterns of interest
En speakers look more certain about low attachment

Loading in our simulated data-set

# Loading in data and inspecting it
load("data/rc_hls.Rdata")

# Inspecting data
summary(rc_hls)

       Certainty3    Condition         L1     
 Not certain:321   NoBreak:300   English:300  
 Neutral    :289   High   :300   Spanish:600  
 Certain    :290   Low    :300

Fitting models

We fit models using clm() function
Same syntax as lm() and glm()
This code fits additive model

# Fitting our ordinal model
clm_fit_addtive <- clm(Certainty3 ~
                         Condition +
                         L1,
                       data = rc_hls)

Walking through the interpretation

summary(clm_fit_addtive)

formula: Certainty3 ~ Condition + L1
data:    rc_hls

 link  threshold nobs logLik  AIC     niter max.grad cond.H 
 logit flexible  900  -974.41 1958.83 3(0)  8.00e-07 2.5e+01

Coefficients:
              Estimate Std. Error z value Pr(>|z|)   
ConditionHigh  0.45710    0.15019   3.043  0.00234 **
ConditionLow  -0.30766    0.15174  -2.028  0.04260 * 
L1Spanish     -0.09735    0.13166  -0.739  0.45966   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Threshold coefficients:
                    Estimate Std. Error z value
Not certain|Neutral  -0.6153     0.1430  -4.303
Neutral|Certain       0.7506     0.1439   5.218

Fitting the model with the interaction

Interactions also use familiar syntax
This code fits interaction
Example on right same model as Certainty3 ~ Condition * L1

# Fitting our ordinal model
clm_fit_interaction <- clm(Certainty3 ~
                             Condition +
                             L1 +
                             Condition:L1,
                           data = rc_hls)

Interpreting interaction

summary(clm_fit_interaction)

formula: Certainty3 ~ Condition + L1 + Condition:L1
data:    rc_hls

 link  threshold nobs logLik  AIC     niter max.grad cond.H 
 logit flexible  900  -923.19 1860.38 4(0)  6.65e-10 1.2e+02

Coefficients:
                        Estimate Std. Error z value Pr(>|z|)    
ConditionHigh            -0.6426     0.2644  -2.430  0.01508 *  
ConditionLow              0.7430     0.2697   2.755  0.00587 ** 
L1Spanish                -0.1696     0.2302  -0.737  0.46133    
ConditionHigh:L1Spanish   1.7096     0.3262   5.241 1.60e-07 ***
ConditionLow:L1Spanish   -1.6126     0.3316  -4.864 1.15e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Threshold coefficients:
                    Estimate Std. Error z value
Not certain|Neutral  -0.7263     0.1941  -3.742
Neutral|Certain       0.7754     0.1944   3.989

Visualizing ordinal models with `sjPlot`

Multiple options for visualizing ordinal models: - Get predicted values from predict() and manipulate them manually - Plot predicted values automatically with a package like sjPlot

p <- plot_model(clm_fit_interaction,
                type = "pred",
                terms = c("Condition", "L1"),
                colors = c("black", "grey50"))

p + 
  theme_base(base_size = 18) +
  ggtitle("") +
  theme(axis.text.x = element_text(angle = 310, 
                                   hjust = 0)) +
  ylab("Predicted probability of response")

Visualizing ordinal models with `sjPlot`

Getting predicted probabilities and plotting them

Fitting mixed models with package `ordinal`

Same random effects syntax as lme4
Harder to fit than many other GLMMs
Analyze matched-guise experiment (thanks Gabriella Licata!)

mg_fit_1 <- clmm(Feminine_voice ~
                   GuiseGender +
                   (1|Participant),
                data = mg_dat)

Problem with ordinal models: thresholds not estimated

Cumulative Link Mixed Model fitted with the Laplace approximation

formula: Feminine_voice ~ GuiseGender + (1 | Participant)
data:    mg_dat

 link  threshold nobs logLik  AIC    niter    max.grad cond.H
 logit flexible  260  -108.48 232.96 684(880) 1.66e-05 NaN   

Random effects:
 Groups      Name        Variance  Std.Dev. 
 Participant (Intercept) 1.039e-08 0.0001019
Number of groups:  Participant 65 

Coefficients:
                Estimate Std. Error z value Pr(>|z|)
GuiseGenderMale   -7.908        NaN     NaN      NaN

Threshold coefficients:
    Estimate Std. Error z value
1|2   -5.428        NaN     NaN
2|3   -5.315        NaN     NaN
3|4   -5.049        NaN     NaN
4|5   -4.041        NaN     NaN
5|6   -3.177        NaN     NaN
6|7   -2.128        NaN     NaN

Looking at the raw data, we can see why

Collapsing categories does not change interpretation

Here is the model with 2-6 collapsed

mg_fit_2 <- clmm(Feminine_voice_collapsed ~
                   GuiseGender +
                   (1|Participant),
                 data = mg_dat,
                 nAGQ = 10)
summary(mg_fit_2)

Cumulative Link Mixed Model fitted with the adaptive Gauss-Hermite 
quadrature approximation with 10 quadrature points 

formula: Feminine_voice_collapsed ~ GuiseGender + (1 | Participant)
data:    mg_dat

 link  threshold nobs logLik AIC    niter    max.grad cond.H 
 logit flexible  260  -86.59 181.19 265(301) 5.14e-08 4.7e+06

Random effects:
 Groups      Name        Variance  Std.Dev. 
 Participant (Intercept) 8.883e-09 9.425e-05
Number of groups:  Participant 65 

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
GuiseGenderMale  -7.1763     0.7992  -8.979   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Threshold coefficients:
    Estimate Std. Error z value
1|6  -4.7012     0.7326  -6.417
6|7  -2.1247     0.2827  -7.515

Advantages of Bayesian ordinal models

Convergence issues less of a problem
Easy and powerful plotting functions
Increased model flexibility (relaxing model assumptions like equal variances)

Relaxing equal variance assumption

Group A has higher than average latent variable, higher probability of higher categories

Group B has lower than average latent variable, probability of responding ‘1’ is now over 50%.

Relaxing equal variance assumption

Group A has higher than average latent variable that also has a smaller variance than Group B

Group B has lower than average latent variable with a wider scale parameter.

Why would we want to model unequal variances?

All error types possible when equal variance assumption incorrect (Liddell & Kruschke, 2018):

Finding effect when there is none
Not finding true effect
Finding an effect in the wrong direction

Fitting unequal variance ordinal model with Stan via `brms`

brms uses lme4-style syntax to fit Bayesian models
Additional parameter disc added to model, we can predict variation in scale

unequal_variance_formula <- bf(
  Friendly ~
    GuiseLanguage +
    GuiseGender +
    GuiseLanguage:GuiseGender +
    (1|Participant)) +
  lf(disc ~
       GuiseLanguage +
       GuiseGender +
       GuiseLanguage:GuiseGender +
       (1|Participant)
)

Interpreting model summary

 Family: cumulative 
  Links: mu = logit; disc = log 
Formula: Friendly ~ GuiseLanguage + GuiseGender + GuiseLanguage:GuiseGender + (1 | Participant) 
         disc ~ GuiseLanguage + GuiseGender + GuiseLanguage:GuiseGender + (1 | Participant)
   Data: g_ord (Number of observations: 260) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Group-Level Effects: 
~Participant (Number of levels: 65) 
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)          1.63      0.47     0.83     2.64 1.01      735      853
sd(disc_Intercept)     0.93      0.17     0.63     1.32 1.00      968     1268

Population-Level Effects: 
                                          Estimate Est.Error l-95% CI u-95% CI
Intercept[1]                                 -5.72      1.41    -8.78    -3.29
Intercept[2]                                 -3.09      0.80    -4.82    -1.70
Intercept[3]                                 -2.12      0.59    -3.45    -1.11
Intercept[4]                                  0.04      0.31    -0.59     0.68
Intercept[5]                                  1.51      0.49     0.70     2.60
Intercept[6]                                  3.99      1.07     2.21     6.36
disc_Intercept                                0.09      0.31    -0.51     0.71
GuiseLanguageItalian                          0.42      0.30    -0.11     1.05
GuiseGenderMale                              -0.17      0.27    -0.72     0.33
GuiseLanguageItalian:GuiseGenderMale         -0.26      0.37    -1.02     0.47
disc_GuiseLanguageItalian                    -0.33      0.23    -0.80     0.12
disc_GuiseGenderMale                         -0.09      0.25    -0.57     0.39
disc_GuiseLanguageItalian:GuiseGenderMale     0.41      0.32    -0.22     1.04
                                          Rhat Bulk_ESS Tail_ESS
Intercept[1]                              1.01      942     1609
Intercept[2]                              1.00      950     1200
Intercept[3]                              1.00      961     1256
Intercept[4]                              1.00     1645     1936
Intercept[5]                              1.01      985     2218
Intercept[6]                              1.01      869     1425
disc_Intercept                            1.00      981     1480
GuiseLanguageItalian                      1.00     1982     2447
GuiseGenderMale                           1.00     2232     2348
GuiseLanguageItalian:GuiseGenderMale      1.00     2237     2684
disc_GuiseLanguageItalian                 1.00     1797     2266
disc_GuiseGenderMale                      1.00     1868     2387
disc_GuiseLanguageItalian:GuiseGenderMale 1.00     2030     2187

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Materials dealing with ordinal models

Garcia, G. D. (2021). Data visualization and analysis in second language research. Routledge.

McElreath, R. (2020). Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman and Hall/CRC.

Christensen, R. H. B. (2019). A Tutorial on fitting Cumulative Link Mixed Models with clmm2 from the ordinal Package. Tutorial for the R Package ordinal https://cran. r-project. org/web/packages/ordinal/Accessed, 1.

Materials dealing with ordinal models

Liddell, T. M., & Kruschke, J. K. (2018). Analyzing ordinal data with metric models: What could possibly go wrong?. Journal of Experimental Social Psychology, 79, 328-348.

Bürkner, P. C., & Vuorre, M. (2019). Ordinal regression models in psychology: A tutorial. Advances in Methods and Practices in Psychological Science, 2(1), 77-101.

Veríssimo, J. (2021). Analysis of rating scales: A pervasive problem in bilingualism research and a solution with Bayesian ordinal models. Bilingualism: Language and Cognition, 24(5), 842-848.

Analyzing rating scale data with cumulative link models using the ordinal package in R

Download workshop script + data

Other decisions will get made first

What we hope you get out of this

What we hope you get out of this

Timeline

Questions?

What is the difference between numeric and ordinal data?

What is the difference between numeric and ordinal data?

Some ordinal data intuitively not numbers

Rating scales less-intuitively not numbers

Common example of distribution of rating scale responses

Mismatch in assumptions between rating scales and linear models

Why we can’t treat rating scales as numbers

So what should we do?

We can estimate where the amount of ‘acceptability’ changes from a 1 to a 2

Now the entire blue area is probability of a ‘1’ response

This corresponds to the probability we saw in the raw data

We model one less threshold than the number of rating categories

Our thresholds stay still, but our latent variable moves

Outline of hypothetical experiment

Can we modulate this bias by introducing a prosodic break?

Let’s examine our data

Loading in our simulated data-set

Fitting models

Walking through the interpretation

Fitting the model with the interaction

Interpreting interaction

Visualizing ordinal models with sjPlot

Visualizing ordinal models with sjPlot

Getting predicted probabilities and plotting them

Fitting mixed models with package ordinal

Problem with ordinal models: thresholds not estimated

Looking at the raw data, we can see why

Collapsing categories does not change interpretation

Here is the model with 2-6 collapsed

Advantages of Bayesian ordinal models

Relaxing equal variance assumption

Relaxing equal variance assumption

Why would we want to model unequal variances?

Fitting unequal variance ordinal model with Stan via brms

Interpreting model summary

Materials dealing with ordinal models

Materials dealing with ordinal models

Visualizing ordinal models with `sjPlot`

Visualizing ordinal models with `sjPlot`

Fitting mixed models with package `ordinal`

Fitting unequal variance ordinal model with Stan via `brms`