简体   繁体   中英

How to generate a latent variable from a set of different kinds of variables with R?

For a n number of observations, I want to generate a latent variable (unobserved), I can assume or not that this variable has a specific distribution or not, from a set of other variables that proxy this latent variable. For my specific case, I want to generate latent ability, from a set of variables that proxy ability (observed ability). One variable is discrete, and exhibit normality, another is binary but very skewed, and the last one is an ordered categorical variable. This looks like my data, and I would like to estimate a response for each observation.

set.seed(123877)
# number of units
n <- 1000L

# age
age <- sample(rnorm(n, 25, 10))

# cum laude 
hon <- sample(0L:1L, n, TRUE, prob = c(.9, .1) )

# prestige of university
pres <- factor(sample(1L:25L, n, TRUE), labels = 25L:1L, ordered = T)

dat <- data.frame(id=1L:n, age, hon, pres)

I found a solution, using the ltm package, here is the code:

set.seed(123877)
u.latent <- vector()
class(u.latent) <- 'try-error'

library('ltm')
while (class(u.latent)=='try-error') {
# numer of units
n <- 1000L

# age
age <- round(rnorm(n, 25, 10))

# cum laude 
hon <- sample(0L:1L, n, TRUE, prob = c(.9, .1) )

# prestige of university
pres <- sample(1L:10L, n, TRUE)

# pres <-factor(pres, levels = 1L:25L, ordered = TRUE)
dat <- data.frame(age, hon, pres)

# latent variable  
u.latent <- try(gpcm(dat))  
}

We can test if the model fits the data:

GoF.gpcm(u.latent)
#H0 the model fits the data
#Ha: the model does not fit the data

The estimates of the latent variables are straight forward:

u.estimates <-factor.scores(u.latent)
hist(u.estimates$score.dat$z1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM