简体   繁体   中英

Factor scores from factor analysis on ordinal categorical data in R

I'm having trouble computing factor scores from an exploratory factor analysis on ordered categorical data. I've managed to assess how many factors to draw, and to run the factor analysis using the psych package, but can't figure out how to get factor scores for individual participants, and haven't found much help online. Here is where I'm stuck:

library(polycor)
library(nFactors)
library(psych)

# load data
dat <- read.csv("https://raw.githubusercontent.com/paulrconnor/datasets/master/data.csv")

# convert to ordered factors
for(i in 1:length(dat)){
  dat[,i] <- as.factor(dat[,i])
}

# compute polychoric correlations
pc <- hetcor(dat,ML=T)

# 2. choose number of factors
ev <- eigen(pc) 
ap <- parallel(subject = nrow(dat), 
               var=ncol(dat),rep=100,cent=.05)
nS <- nScree(x = ev$values, aparallel = ap$eigen$qevpea)
dev.new(height=4,width=6,noRStudioGD = T)
plotnScree(nS) # 2 factors, maybe 1

# run FA
faPC <- fa(r=pc$correlations, nfactors = 2, rotate="varimax",fm="ml")
faPC$loadings

Edit: I've found a way to get scores using irt.fa() and scoreIrt(), but it involved converting my ordered categories to numeric so I'm not sure it's valid. Any advice would be much appreciated!

x = as.matrix(dat)
fairt <- irt.fa(x = x,nfactors=2,correct=TRUE,plot=TRUE,n.obs=NULL,rotate="varimax",fm="ml",sort=FALSE)
for(i in 1:length(dat)){dat[,i] <- as.numeric(dat[,i])}
scoreIrt(stats = fairt, items = dat, cut = 0.2, mod="logistic") 

That's an interesting problem. Regular factor analysis assumes your input measures are ratio or interval scaled. In the case of ordinal variables, you have a few options. You could either use an IRT based approach (in which case you'd be using something like the Graded Response Model), or to do as you do in your example and use the polychoric correlation matrix as the input to factor analysis. You can see more discussion of this issue here

Most factor analysis packages have a method for getting factor scores, but will give you different output depending on what you choose to use as input. For example, normally you can just use factor.scores() to get your expected factor scores, but only if you input your original raw score data. The problem here is the requirement to use the polychoric matrix as input

I'm not 100% sure (and someone please correct me if I'm wrong), but I think the following should be OK in your situation:

dat <- read.csv("https://raw.githubusercontent.com/paulrconnor/datasets/master/data.csv")
dat_orig <- dat

#convert to ordered factors
for(i in 1:length(dat)){
    dat[,i] <- as.factor(dat[,i])
}

# compute polychoric correlations
pc <- hetcor(dat,ML=T)

# run FA
faPC <- fa(r=pc$correlations, nfactors = 2, rotate="varimax",fm="ml")

factor.scores(dat_orig, faPC)

In essence what you're doing is:

  1. Calculate the polychoric correlation matrix
  2. Use that matrix to conduct the factor analysis and extract 2 factors and associated loadings
  3. Use the loadings from the FA and the raw (numeric) data to get your factor scores

Both this method, and the method you use in your edit, treat the original data as numeric rather than factors. I think this should be OK because you're just taking your raw data and projecting it down on the factors identified by the FA, and the loadings there are already taking into account the ordinal nature of your variables (as you used the polychoric matrix as input into FA). The post linked above cautions against this approach, however, and suggests some alternatives, but this is not a straightforward problem to solve

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM