计算数据表中每个组内的记录数并生成行号

Question

I have the following data.table我有以下 data.table

set.seed(1)
DT <- data.table(VAL = sample(c(1, 2, 3), 10, replace = TRUE))
    VAL
 1:   1
 2:   2
 3:   2
 4:   3
 5:   1
 6:   3
 7:   3
 8:   2
 9:   2
10:   1

Within each number in VAL I want to:在VAL每个数字中，我想：

Count the number of records/rows计算记录数/行数
Create an row index (counter) of first, second, third occurrence et c.创建第一次、第二次、第三次出现等的行索引（计数器）。

At the end I want the result最后我想要结果

    VAL COUNT IDX
 1:   1     3   1
 2:   2     4   1
 3:   2     4   2
 4:   3     3   1
 5:   1     3   2
 6:   3     3   2
 7:   3     3   3
 8:   2     4   3
 9:   2     4   4
10:   1     3   3

where "COUNT" is the number of records/rows for each "VAL", and "IDX" is the row index within each "VAL".其中“COUNT”是每个“VAL”的记录/行数，“IDX”是每个“VAL”内的行索引。

I tried to work with which and length using .I :我尝试使用.I使用which和length ：

 dt[, list(COUNT = length(VAL == VAL[.I]), 
             IDX = which(which(VAL == VAL[.I]) == .I))]

but this does not work as .I refers to a vector with the index, so I guess one must use .I[] .但这不起作用，因为.I指的是带有索引的向量，所以我想必须使用.I[] 。 Though inside .I[] I again face the problem, that I do not have the row index and I do know (from reading data.table FAQ and following the posts here) that looping through rows should be avoided if possible.虽然在.I[]我再次面临这个问题，我没有行索引，而且我知道（从阅读data.table FAQ 和关注这里的帖子），如果可能的话，应该避免遍历行。

So, what's the data.table way?那么， data.table方式是什么？

Answer 1

Using .N ...使用.N ...

DT[ , `:=`( COUNT = .N , IDX = 1:.N ) , by = VAL ]
#    VAL COUNT IDX
# 1:   1     3   1
# 2:   2     4   1
# 3:   2     4   2
# 4:   3     3   1
# 5:   1     3   2
# 6:   3     3   2
# 7:   3     3   3
# 8:   2     4   3
# 9:   2     4   4
#10:   1     3   3

.N is the number of records in each group, with groups defined by "VAL" . .N是每个组中的记录数，组由"VAL"定义。

计算数据表中每个组内的记录数并生成行号

问题描述

1 个解决方案

解决方案1
96 已采纳 2013-11-08 21:53:45

计算数据表中每个组内的记录数并生成行号

问题描述

1 个解决方案

解决方案1 96 已采纳 2013-11-08 21:53:45

解决方案1
96 已采纳 2013-11-08 21:53:45