繁体   English   中英

如何在 arguments 中使用带有向量的 data.table fifelse?

[英]How to use data.table fifelse with vectors in the arguments?

说我有这个 data.frame

DF <- data.frame(one=c(1, NA, NA, 1, NA, NA), two=c(NA,1,NA, NA, NA,1), 
         three=c(NA,NA, 1, NA, 1,NA))

one    two  three         output
  1     NA    NA             one
 NA      1    NA             two
 NA     NA     1           three
  1     NA    NA             one  
 NA     NA     1           three
 NA      1    NA             two

这些列是互斥的。
我需要生成 output

output=c("one","two","three","one","three", "two")

我试过用 data.table fifelse 来解决它,但它

with(DF,fifelse(one==1, "one", fifelse(two==1,"two", "three", na="three"), 
   na=fifelse(two==1,"two", "three", na="three")))

Error in fifelse(one == 1, "one", fifelse(two == 1, "two", "three", na = "three"),  : 
  Length of 'na' is 6 but must be 1

它似乎不接受 arguments 上的矢量。

dplyr 的 if_else 在这里效果很好。

with(DF,if_else(one==1, "one", if_else(two==1,"two", "three", missing="three"), 
   missing=if_else(two==1,"two", "three", missing="three")))

如何获得与 data.table 相同的 output?

任何其他简单的选择。 我可以使用 R 基地

apply(DF,1, function(x) which(!is.na(x)))

然后用字符替换该数字。

另一个 data.table 备选方案:

for (col in names(DF)) set(DF, which(DF[[col]] == 1), j = "output", value = col)

fifelse不是最好的工具,我建议fcase更容易:

data.table

library(data.table)
as.data.table(DF)[, fcase(one == 1, "one", two == 1, "two", three == 1, "three")]
# [1] "one"   "two"   "three" "one"   "three" "two"  

dplyr

dplyr 模拟是case_when

library(dplyr)
with(DF, case_when(one == 1 ~ "one", two == 1 ~ "two", three == 1 ~ "three"))
# [1] "one"   "two"   "three" "one"   "three" "two"  

基地 R

data.tabledplyr实现都假定先验地知道列名。 不可知论的 base-R 方法:

colnames(DF)[apply(DF, 1, which.max)]
# [1] "one"   "two"   "three" "one"   "three" "two"  

(顺便说一句,这里which.max也可以是which.min ,实际上我们只是在寻找一个非NA值。)

在这种情况下,如果您有其他不应考虑的列,则需要在apply(DF, ...)中对DF进行子集化,以便它仅查看所需的列。

如果每一行只有一个非 NA 值,则可以尝试max.col

> names(DF)[max.col(!is.na(DF))]
[1] "one"   "two"   "three" "one"   "three" "two"

col + na.omit (但如果你追求速度,这可能会很慢)

> names(DF)[na.omit(c(t(col(DF) * DF)))]
[1] "one"   "two"   "three" "one"   "three" "two"

对标

microbenchmark(
    f1 = names(DF)[max.col(!is.na(DF))],
    f2 = names(DF)[na.omit(c(t(col(DF) * DF)))]
)

Unit: microseconds
 expr   min     lq    mean median    uq    max neval
   f1  28.5  51.45  92.343  64.40  91.8 1532.5   100
   f2 300.7 527.65 634.755 595.35 691.5 2405.4   100

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM