简体   繁体   English

在R中的data.table中按分组,仅保留列中的非NA值

[英]Group by in data.table in R which only keep non NA values from columns

I am new to R. 我是R的新手。

I want to group by a data.table and keep only non NA values from columns. 我想按data.table分组,并仅保留列中的非NA值。

My table is look like: 我的桌子看起来像:

c1   c2   c3   c4
1    A    NA   NA
1    NA   B    NA
1    NA   NA   C
2    A1   NA   NA
2    NA   B1   NA
2    NA   NA   C1

I want to have a result: 我想要一个结果:

c1   c2   c3   c4
1    A    B    C
2    A1   B1   C1

Hope anyone can help! 希望任何人都能提供帮助!

Try 尝试

library(data.table)
setDT(df1)[, lapply(.SD, na.omit) , by = c1]
#    c1 c2 c3 c4
#1:  1  A  B  C
#2:  2 A1 B1 C1

Or 要么

setDT(df)[, lapply(.SD, function(x) x[!is.na(x)]) , by = c1]

I have checked 2 methods in @akrun answer and I found that method 2 is better. 我在@akrun答案中检查了2种方法,发现方法2更好。

Update: I also add function which uses complete.cases as @akrun suggestion. 更新:我还添加了使用complete.cases作为@akrun建议的函数。

 f1 <- function (d) d[, lapply(.SD, na.omit) , by = c1]
 f2 <- function (d) d[, lapply(.SD, function(x) x[!is.na(x)]) , by = c1]
 f3 <- function (d) d[, lapply(.SD, function(x) x[complete.cases(x)]), by = c1]

 microbenchmark(f1(copy(dt2)), f2(copy(dt2)), f3(copy(dt2)))


#Unit: milliseconds
#          expr       min        lq      mean    median        uq       max neval
# f1(copy(dt2)) 124.22661 132.84712 138.00615 135.48418 140.18581 222.20735   100
# f2(copy(dt2))  14.47915  16.37986  18.15728  17.35153  18.38754  28.72007   100
# f3(copy(dt2))  22.10803  24.43208  27.63959  26.18713  31.58418  39.31601   100

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM