简体   繁体   English

如何排除 R data.table 列,然后为它们分配一个值

[英]How to exclude R data.table columns and then assign them a value

Currently I am trying to use Latent Class Analysis (LCA) in R using the depmixS4 library in the following dataset:目前,我正在尝试使用以下数据集中的depmixS4库在 R 中使用潜在类分析(LCA):

Subject     category       f1   f2  f3  f4  
02retY      73             1    1   1   1   
02retY      128            1    0   1   0   
03CzUL        5            0    0   0   0   
03CzUL       73            1    0   0   0   
03CzUL      98             1    1   1   1   

where each f_i is a filter.其中每个f_i是一个过滤器。 I have used the following 2 functions in data.table in order to clusterize each category in 2 classes :我在data.table中使用了以下 2 个函数,以便将每个类别分为 2 个类:

LCA <- function(dt,y) {
  mod1 <- mix(list(f1 ~ 1, f2 ~ 1, f3 ~ 1, f4 ~ 1), 
              data = dt, 
              nstates = 2,
              family = list(multinomial("identity"), multinomial("identity"), multinomial("identity"), multinomial("identity")),
              respstart=runif(16))
  fmod1 <- fit(mod1, verbose=FALSE)
  posterior.states <- depmixS4::posterior(fmod1)
  return(posterior.states$state)
}

UsablePosCategory <- function(DataTable) {
  DataTable[!is.na(f1) & !is.na(f2) & !is.na(f3) &!is.na(amplitude.f4),
                              cluster.usable := LCA(.SD, x), 
                              by = c("week", "pc" ),
                              .SDcols = f1:f4]
  return(DataTable)
}

However there are a few f_i s (ex. f4 or f1 ) in some categories that only has 1 unique value (ex. for category 128 f5 has only 0) and thus the algorithm cannot give a solution and throws as a result an NA .然而,在某些类别中有一些f_i s(例如 f4 或 f1 )只有 1 个唯一值(例如类别 128 f5 只有 0),因此算法无法给出解决方案并因此抛出NA is there a way to select only the columns that have 2 factors/levels/values and then in the part of the LCA function in the list(f1 ~ 1, f2 ~ 1, f3 ~ 1, f4 ~ 1) make the assignation ~ 1 to the chosen columns?有没有办法只选择具有 2 个因子/水平/值的列,然后在list(f1 ~ 1, f2 ~ 1, f3 ~ 1, f4 ~ 1)中的LCA函数部分list(f1 ~ 1, f2 ~ 1, f3 ~ 1, f4 ~ 1)进行分配~ 1到所选列? I don't know if I explain myself?不知道能不能解释一下?

Here is an option.这是一个选项。 The first line of code identify the columns with more than 2 unique values.第一行代码标识具有 2 个以上唯一值的列。 Then the next line creates a list of formulae consisting of those columns.然后下一行创建一个由这些列组成的公式列表。

LCA <- function(dt) {
    cols <- names(dt)[dt[, sapply(.SD, function(x) uniqueN(x) > 1L)]]
    fml <- lapply(cols, function(x) as.formula(paste0(cols, " ~ 1")))
    mod1 <- depmixS4::mix(fml, 
        data = dt, 
        nstates = 2,
        family = replicate(length(cols), multinomial("identity"), simplify=FALSE),
        respstart=runif(16))
    fmod1 <- fit(mod1, verbose=FALSE)
    posterior.states <- depmixS4::posterior(fmod1)
    return(posterior.states$state)
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM