[英]data.table: create new columns with lapply
i have a data.table and want to apply a function to on each subset of a row. 我有一个data.table,并希望在一行的每个子集上应用一个函数。 Normaly one would do as follows:
DT[, lapply(.SD, function), by = y]
Normaly会做如下:
DT[, lapply(.SD, function), by = y]
But in my case the function does not return a atomic vector but simply a vector. 但在我的情况下,函数不返回原子向量而只返回向量。 Is there a chance to do something like this?
有没有机会做这样的事情?
library(data.table)
set.seed(9)
DT <- data.table(x1=letters[sample(x=2L,size=6,replace=TRUE)],
x2=letters[sample(x=2L,size=6,replace=TRUE)],
y=rep(1:2,3), key="y")
DT
# x1 x2 y
#1: a a 1
#2: a b 1
#3: a a 1
#4: a a 2
#5: a b 2
#6: a a 2
DT[, lapply(.SD, table), by = y]
# Desired Result, something like this:
# x1_a x2_a x2_b
# 3 2 1
# 3 2 1
Thanks in advance, and also: I would not mind if the result of the function must have a fixed length. 在此先感谢,并且:我不介意函数的结果是否必须具有固定长度。
You simply need to unlist the table and then coerce back to a list: 您只需要取消列表,然后强制回到列表:
> DTCounts <- DT[, as.list(unlist(lapply(.SD, table))), by=y]
> DTCounts
y x1.a x2.a x2.b
1: 1 3 2 1
2: 2 3 2 1
. 。
if you do not like the dots in the names, you can sub
them out: 如果你不喜欢名字中的圆点,你可以将它们
sub
出来:
> setnames(DTCounts, sub("\\.", "_", names(DTCounts)))
> DTCounts
y x1_a x2_a x2_b
1: 1 3 2 1
2: 2 3 2 1
Note that if not all values in a column are present for each group 请注意,如果不是每个组都存在列中的所有值
(ie, if x2=c("a", "b")
when y=1
, but x2=c("b", "b")
when y=2
) (即,如果
x2=c("a", "b")
时, y=1
,但x2=c("b", "b")
时, y=2
)
then the above breaks. 然后上面的休息。
The solution is to make the columns factors before counting. 解决方案是在计数之前制作列因子。
DT[, lapply(.SD, is.factor)]
## OR
columnsToConvert <- c("x1", "x2") # or .. <- setdiff(names(DT), "y")
DT <- cbind(DT[, lapply(.SD, factor), .SDcols=columnsToConvert], y=DT[, y])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.