[英]R data.table: how to change each preceding 0 into a 1 within a column?
I have the following R data.table, which is composed of only one column: 我有以下R data.table,它仅由一列组成:
library(data.table)
DT <- data.table(first_column = c(0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0))
> DT
first_column
1: 0
2: 0
3: 0
4: 1
5: 1
6: 1
7: 0
8: 0
9: 1
10: 1
11: 0
12: 0
13: 0
14: 0
15: 1
16: 1
17: 1
18: 1
19: 1
20: 0
21: 0
... ...
The binary column first_column
is composed of "clusters" of consecutive ones. 二进制列
first_column
由连续的“簇”组成。
I would like to turn each preceding 0 for each cluster and turn this into a 1. Somehow, one checks for a 1
, and then change the preceding 0 into 1. 我想将每个群集的每个前面的0都变成1。以某种方式,一个检查
1
,然后将前面的0变成1。
EDIT: To be more clear, the pattern 0001110011000011111...
would become 0011110111000111111...
编辑:更清楚地说,模式
0001110011000011111...
将变成0011110111000111111...
Try this using diff
: 使用
diff
尝试一下:
DT$first_column[diff(DT$first_column)==1] <- 1
# first_column
# 1: 0
# 2: 0
# 3: 1
# 4: 1
# 5: 1
# 6: 1
# 7: 0
# 8: 1
# 9: 1
# 10: 1
# 11: 0
# 12: 0
# 13: 0
# 14: 1
# 15: 1
# 16: 1
# 17: 1
# 18: 1
# 19: 1
# 20: 0
# 21: 0
# first_column
Basically diff
will output 1
wherever a 1
is preceded by a 0
. 基本上
diff
会在1
后面加上0
地方输出1
。
This will replace the final value of each 0/1 "group" with a 1, which will be redundant for the 1 groups, but what you want to accomplish for the 0s (if I read your question correctly). 这会将每个0/1“组”的最终值替换为1,这对于1个组来说是多余的,但是对于0s,您想要完成什么(如果我正确地阅读了您的问题)。
DT[, c(head(first_column, -1), 1), by=rleid(first_column)]
rleid
is used to group adjacent 0s and 1s and head
with -1 keeps all but the final element. rleid
用于对相邻的0和1进行分组,以-1表示的head
保留除最终元素以外的所有元素。 Or even better, you can use replace
as @Frank suggests, like this 甚至更好,您可以像@Frank所建议的那样使用
replace
,就像这样
DT[, replace(first_column, .N, 1), by=rleid(first_column)]
where .N
is used to specify the final row in the group. .N
用于指定组中的最后一行。 Both of these return 这些都回来了
rleid V1
1: 1 0
2: 1 0
3: 1 1
4: 2 1
5: 2 1
6: 2 1
7: 3 0
8: 3 1
9: 4 1
10: 4 1
11: 5 0
12: 5 0
13: 5 0
14: 5 1
15: 6 1
16: 6 1
17: 6 1
18: 6 1
19: 6 1
20: 7 0
21: 7 1
rleid V1
These solutions (incorrectly) fill in the final observation with a 1. One way to avoid this is to add a check before filling in the values. 这些解决方案(错误地)用1填充了最终的观察值。避免这种情况的一种方法是,在填充值之前添加检查。
DT[, if(.I[.N] < nrow(DT)) replace(first_column, .N, 1) else first_column,
by=rleid(first_column)]
Here, .I[.N] < nrow(DT)
returns TRUE for every group except the final group. 在这里,
.I[.N] < nrow(DT)
对除最终组以外的每个组返回TRUE。 The final observation of this group is left "as is." 该组的最终观察结果保持不变。
If I understood the OP correctly, he wants to turn any occurence of the sub-sequence 0,1
into 1,1
: 如果我正确理解了OP,他想将子序列
0,1
变成1,1
:
DT <- data.table(first_column = c(0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0))
DT[first_column == 0 & shift(first_column, type = "lead") == 1, first_column := 1]
DT[, first_column]
# [1] 0 0 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 1 1 0 0
At the expense of implicit type conversions from double
to logical
, this can be written more concisely as: 可以隐式地将类型从
double
转换为logical
,这可以写得更简洁:
DT <- data.table(first_column = c(0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0))
DT[!first_column & shift(first_column, type = "lead"), first_column := 1]
DT[, first_column]
# [1] 0 0 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 1 1 0 0
Here, the fact is used that 0
is treated as FALSE
and any number unequal to 0
as TRUE
. 在这里,使用的事实是将
0
视为FALSE
而将任何不等于0
视为TRUE
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.