[英]Count number of records and generate row number within each group in a data.table
I have the following data.table我有以下 data.table
set.seed(1)
DT <- data.table(VAL = sample(c(1, 2, 3), 10, replace = TRUE))
VAL
1: 1
2: 2
3: 2
4: 3
5: 1
6: 3
7: 3
8: 2
9: 2
10: 1
Within each number in VAL
I want to:在VAL
每个数字中,我想:
At the end I want the result最后我想要结果
VAL COUNT IDX
1: 1 3 1
2: 2 4 1
3: 2 4 2
4: 3 3 1
5: 1 3 2
6: 3 3 2
7: 3 3 3
8: 2 4 3
9: 2 4 4
10: 1 3 3
where "COUNT" is the number of records/rows for each "VAL", and "IDX" is the row index within each "VAL".其中“COUNT”是每个“VAL”的记录/行数,“IDX”是每个“VAL”内的行索引。
I tried to work with which
and length
using .I
:我尝试使用.I
使用which
和length
:
dt[, list(COUNT = length(VAL == VAL[.I]),
IDX = which(which(VAL == VAL[.I]) == .I))]
but this does not work as .I
refers to a vector with the index, so I guess one must use .I[]
.但这不起作用,因为.I
指的是带有索引的向量,所以我想必须使用.I[]
。 Though inside .I[]
I again face the problem, that I do not have the row index and I do know (from reading data.table
FAQ and following the posts here) that looping through rows should be avoided if possible.虽然在.I[]
我再次面临这个问题,我没有行索引,而且我知道(从阅读data.table
FAQ 和关注这里的帖子),如果可能的话,应该避免遍历行。
So, what's the data.table
way?那么, data.table
方式是什么?
Using .N
...使用.N
...
DT[ , `:=`( COUNT = .N , IDX = 1:.N ) , by = VAL ]
# VAL COUNT IDX
# 1: 1 3 1
# 2: 2 4 1
# 3: 2 4 2
# 4: 3 3 1
# 5: 1 3 2
# 6: 3 3 2
# 7: 3 3 3
# 8: 2 4 3
# 9: 2 4 4
#10: 1 3 3
.N
is the number of records in each group, with groups defined by "VAL"
. .N
是每个组中的记录数,组由"VAL"
定义。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.