r data.table：聚合分组列不一致

Question

I'm using data.table package to aggregate a column which is also a grouping column. 我正在使用data.table包来聚合一个列，该列也是一个分组列。 But the results are not what I expected. 但结果并不是我的预期。

my_data =  data.table(contnt=c("america", "asia", "asia","europe", "europe", "europe"), num= 1:6)

#my_data
#contnt  num
#america  1
#asia     2
#asia     3
#europe   4
#europe   5
#europe   6

my_data[, length(contnt),by=contnt]
#contnt  V1
#america  1
#asia     1
#europe   1

It works differently when I aggregate a column other than grouping column 当我聚合除分组列之外的列时，它的工作方式不同

my_data[, length(num),by=contnt]
#contnt  V1
#america  1
#asia     2
#europe   3

What causes this discrepancy? 造成这种差异的原因是什么？

Answer 1

This is a great example to demonstrate the way data.table passes grouping variables vs. other variables to functions: 这是一个很好的例子来演示data.table将分组变量与其他变量分组到函数的方式：

my_data[,print(contnt),by=contnt]
# [1] "america"
# [1] "asia"
# [1] "europe"

my_data[,print(num),by=contnt]
# [1] 1
# [1] 2 3
# [1] 4 5 6

Essentially, grouping variables are passed as vectors of length 1 for each group, whereas for other variables, the entire vector for each group is passed. 实质上，分组变量作为长度为1的向量传递给每个组，而对于其他变量，则传递每个组的整个向量。

Answer 2

Please study the data.table FAQ : 请研究data.table常见问题：

Inside each group, why are the group variables length-1? 在每个组内，为什么组变量长度为1？

[...] x is a grouping variable and (as from v1.6.1) has length 1 (if inspected or used in j ). [...] x是分组变量，（从v1.6.1开始）长度为1（如果在j检查或使用）。 It's for efficiency and convenience . 这是为了提高效率和方便性 。 [...] [...]

If you need the size of the current group, use .N rather than calling length() on any column. 如果需要当前组的大小，请使用.N而不是在任何列上调用length() 。

r data.table：聚合分组列不一致

问题描述

2 个解决方案

解决方案1
6 2017-10-14 14:55:12

解决方案2
2 2017-10-15 08:57:15

r data.table：聚合分组列不一致

问题描述

2 个解决方案

解决方案1 6 2017-10-14 14:55:12

解决方案2 2 2017-10-15 08:57:15

解决方案1
6 2017-10-14 14:55:12

解决方案2
2 2017-10-15 08:57:15