简体   繁体   中英

How to exclude R data.table columns and then assign them a value

Currently I am trying to use Latent Class Analysis (LCA) in R using the depmixS4 library in the following dataset:

Subject     category       f1   f2  f3  f4  
02retY      73             1    1   1   1   
02retY      128            1    0   1   0   
03CzUL        5            0    0   0   0   
03CzUL       73            1    0   0   0   
03CzUL      98             1    1   1   1   

where each f_i is a filter. I have used the following 2 functions in data.table in order to clusterize each category in 2 classes :

LCA <- function(dt,y) {
  mod1 <- mix(list(f1 ~ 1, f2 ~ 1, f3 ~ 1, f4 ~ 1), 
              data = dt, 
              nstates = 2,
              family = list(multinomial("identity"), multinomial("identity"), multinomial("identity"), multinomial("identity")),
              respstart=runif(16))
  fmod1 <- fit(mod1, verbose=FALSE)
  posterior.states <- depmixS4::posterior(fmod1)
  return(posterior.states$state)
}

UsablePosCategory <- function(DataTable) {
  DataTable[!is.na(f1) & !is.na(f2) & !is.na(f3) &!is.na(amplitude.f4),
                              cluster.usable := LCA(.SD, x), 
                              by = c("week", "pc" ),
                              .SDcols = f1:f4]
  return(DataTable)
}

However there are a few f_i s (ex. f4 or f1 ) in some categories that only has 1 unique value (ex. for category 128 f5 has only 0) and thus the algorithm cannot give a solution and throws as a result an NA . is there a way to select only the columns that have 2 factors/levels/values and then in the part of the LCA function in the list(f1 ~ 1, f2 ~ 1, f3 ~ 1, f4 ~ 1) make the assignation ~ 1 to the chosen columns? I don't know if I explain myself?

Here is an option. The first line of code identify the columns with more than 2 unique values. Then the next line creates a list of formulae consisting of those columns.

LCA <- function(dt) {
    cols <- names(dt)[dt[, sapply(.SD, function(x) uniqueN(x) > 1L)]]
    fml <- lapply(cols, function(x) as.formula(paste0(cols, " ~ 1")))
    mod1 <- depmixS4::mix(fml, 
        data = dt, 
        nstates = 2,
        family = replicate(length(cols), multinomial("identity"), simplify=FALSE),
        respstart=runif(16))
    fmod1 <- fit(mod1, verbose=FALSE)
    posterior.states <- depmixS4::posterior(fmod1)
    return(posterior.states$state)
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM