[英]Unexpected result using data.table's shift() by group (bug?)
Consider this dataset 考虑这个数据集
dt <- data.table(ID = c(1,8,9,20,32,33), Char = c("A", "A", "B", "B", "C", "C"))
dt
ID Char
1: 1 A
2: 8 A
3: 9 B
4: 20 B
5: 32 C
6: 33 C
I want to identify "runs" by ID, ie consecutive rows where the ID differs by 1, but I only want to consider runs within the same Char group. 我想通过ID来识别“运行”,即ID相差1的连续行,但我只想考虑同一Char组中的运行。 I can do this as follows
我可以这样做如下
dt[, InRun := FALSE]
dt[, DistToAbove := abs(ID - shift(ID, type="lag")), by=Char]
dt[, DistToBelow := abs(ID - shift(ID, type="lead")), by=Char]
dt[DistToAbove <= 1 | DistToBelow <= 1, InRun := TRUE, by=Char]
dt
ID Char InRun DistToAbove DistToBelow
1: 1 A FALSE NA 7
2: 8 A FALSE 7 NA
3: 9 B FALSE NA 11
4: 20 B FALSE 11 NA
5: 32 C TRUE NA 1
6: 33 C TRUE 1 NA
I tried simplifying the above code into the lines below, but the answer differs 我尝试将上面的代码简化为以下几行,但答案有所不同
dt[, InRun := FALSE]
dt[abs(ID - shift(ID, type="lag")) <= 1 | abs(shift(ID, type="lead") - ID) <= 1, InRun := TRUE, by=Char]
dt
ID Char InRun DistToAbove DistToBelow
1: 1 A FALSE NA 7
2: 8 A TRUE 7 NA
3: 9 B TRUE NA 11
4: 20 B FALSE 11 NA
5: 32 C TRUE NA 1
6: 33 C TRUE 1 NA
What gives? 是什么赋予了? (Note I'm using data.table v1.9.7)
(请注意,我正在使用data.table v1.9.7)
I want to identify "runs" by ID, ie consecutive rows where the ID differs by 1, but I only want to consider runs within the same Char group.
我想通过ID来识别“运行”,即ID相差1的连续行,但我只想考虑同一Char组中的运行。
Here's how I'd approach it: 这是我的处理方法:
dt[, run_id := cumsum(
( ID != shift(ID, fill = ID[1L]) + 1L )
|
( Char != shift(Char, fill = Char[1L]) )
)]
dt[, in_run := .N > 1L, by=.(Char, run_id)]
ID Char run_id in_run
1: 1 A 1 FALSE
2: 8 A 2 FALSE
3: 9 B 3 FALSE
4: 20 B 4 FALSE
5: 32 C 5 TRUE
6: 33 C 5 TRUE
This code identifies all runs (including those with length of one) and then tests for length greater than one (the OP's definition). 该代码标识所有运行(包括长度为一的运行),然后测试长度大于一的运行(OP的定义)。
Regarding the OP's approach: 关于OP的方法:
dt[abs(ID - shift(ID, type="lag")) <= 1 | abs(shift(ID, type="lead") - ID) <= 1, # i
InRun := TRUE # j
, by=Char] # by
In DT[i,j,by]
the steps are: filter using i
, then group with by
, then calculate j
. 在
DT[i,j,by]
,步骤是:使用i
进行过滤,然后与by
,然后计算j
。 You can't do by-group calculations in i
in the way attempted here. 您无法以此处尝试的方式在
i
中进行按组计算。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.