简体   繁体   English

R data.table“ j”对“ by”变量的引用非常不直观吗?

[英]R data.table “j” reference to “by” variables very unintuitive?

I'm just doing the data.table datacamp excercises and there is something which really disturbes my sense for logic. 我只是在做data.table datacamp练习,确实有些事情干扰了我的逻辑理解。 Somehow columns which are refered to by the "by" operator are treated different to other columns? 以某种方式将“ by”运算符引用的列与其他列区别对待?

The used data table is the following: 使用的数据表如下:

         DT
      x  y  z
   1: 2  1  2
   2: 1  3  4
   3: 2  5  6
   4: 1  7  8
   5: 2  9 10
   6: 2 11 12
   7: 1 13 14

When I enter DT[,sum(x),x] I would expect: 当我输入DT [,sum(x),x]时,我会期望:

   x V1
1: 2  8
2: 1  3

but I get: 但我得到:

   x V1
1: 2  2
2: 1  1

for other columns I get the group sum as I would expect it: 对于其他列,我得到了预期的组总和:

> DT[,sum(y),x]
      x V1
   1: 2 26
   2: 1 23

One way to fix this would be to name the grouping variable with a different name 解决此问题的一种方法是使用不同的名称命名分组变量

setnames(DT[, sum(x), .(xN=x)], "xN", "x")[]
#   x V1
#1: 2  8
#2: 1  3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM