[英]Using data.table to calculate a function which depends on many columns
There are many posts which discuss applying a function over many columns when using data.table. 有许多帖子讨论在使用data.table时在多列上应用函数。 However I need to calculate a function which depends on many columns.
但是我需要计算一个依赖于许多列的函数。 As an example:
举个例子:
# Create a data table with 26 columns. Variable names are var1, ..., var 26
data.mat = matrix(sample(letters, 26*26, replace=TRUE),ncol=26)
colnames(data.mat) = paste("var",1:26,sep="")
data.dt <- data.table(data.mat)
Now, say I would like to count the number of 'a's in columns 5,6,7 and 8. I cannot see how to do this with SDcols and end up doing: 现在,假设我想计算第5,6,7和8列中'a'的数量。我看不到如何使用SDcols执行此操作并最终执行:
data.dt[,numberOfAs := (var5=='a')+(var6=='a')+(var7=='a')+(var7=='a')]
Which is very tedious. 这很乏味。 Is there a more sensible way to do this?
有更合理的方法吗?
Thanks 谢谢
I really suggest going through the vignettes linked here . 我真的建议你看看这里链接的小插曲 。 Section 2e from the Introduction to data.table vignette explains
.SD
and .SDcols
. data.table插图简介中的第2e节解释了
.SD
和.SDcols
。
.SD
is just a data.table containing the data for current group. .SD
只是一个包含当前组数据的data.table。 And .SDcols
tells the columns .SD
should have. 并且
.SDcols
告诉列.SD
应该有。 A useful way is to use print
to see the content. 一种有用的方法是使用
print
来查看内容。
# .SD contains cols 5:8
data.dt[, print(.SD), .SDcols=5:8]
Since there is no by
here, .SD
contains all the rows of data.dt
, corresponding to the columns specified in .SDcols
. 由于没有
by
这里, .SD
包含的所有行data.dt
,对应于指定的列.SDcols
。
Once you understand this, the task reduces to your knowledge of base R really. 一旦你理解了这一点,任务就会减少你对基础R的了解。 You can accomplish this in more than one way.
您可以通过多种方式实现此目的。
data.dt[, numberOfAs := rowSums(.SD == "a"), .SDcols=5:8]
We return a logical matrix by comparing all the columns in .SD
to "a" . 我们通过将
.SD
所有列与“a”进行比较来返回逻辑矩阵。 And then use rowSums
to sum them up. 然后使用
rowSums
对它们进行总结。
Another way using Reduce
: 使用
Reduce
另一种方法:
data.dt[, numberOfAs := Reduce(`+`, lapply(.SD, function(x) x == "a")), .SDcols=5:8]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.