[英]Finding values in consecutive rows
An example of the dataframe I have is given below. 下面是我拥有的数据框的示例。
ID X
1 1
2 2
3 1
4 0
5 0
6 1
7 4
8 5
9 6
10 7
11 0
12 0
I want to apply logic to it that looks to see whether 3 or more consecutive rows have a value >0 in it. 我想对其应用逻辑,以查看是否有3个或更多连续行的值大于0。 If they do I want to flag them in another column. 如果他们这样做,我想在另一列中对其进行标记。 Hence the output will look as follows. 因此,输出将如下所示。
ID X Y
1 1 1
2 2 1
3 1 1
4 0 0
5 0 0
6 1 1
7 4 1
8 5 1
9 6 1
10 7 1
11 0 0
12 0 0
EXTENSION - How would I get the following output, givibng a different Y value for each group? 扩展-如何为每个组赋予不同的Y值以获得以下输出?
ID X Y
1 1 1
2 2 1
3 1 1
4 0 0
5 0 0
6 1 2
7 4 2
8 5 2
9 6 2
10 7 2
11 0 0
12 0 0
One option with base R
. 以base R
为base R
一种选择。 Using rle
to find the adjacent values in 'X' that are greater than 0, then do the rep
lication based on the lengths
使用rle
来查找“X”是大于0的相邻值,然后执行rep
基于所述lication lengths
df1$Y <- with(rle(df1$X > 0), as.integer(rep(values & lengths > 2, lengths)))
df1$Y
#[1] 1 1 1 0 0 1 1 1 1 1 0 0
For the updated case in the OP's post 对于OP帖子中的更新案例
df1$Y <- inverse.rle(within.list(rle(df1$X > 0), {
i1 <- values & (lengths > 2)
values[i1] <- seq_along(values[i1])}))
df1$Y
#[1] 1 1 1 0 0 2 2 2 2 2 0 0
Or using rleid
from data.table
或使用rleid
的data.table
library(data.table)
setDT(df1)[, Y := as.integer((.N > 2) * (X > 0)),rleid(X > 0)]
df1 <- structure(list(ID = 1:12, X = c(1L, 2L, 1L, 0L, 0L, 1L, 4L, 5L,
6L, 7L, 0L, 0L)), class = "data.frame", row.names = c(NA, -12L
))
We can use rleid
from data.table
to create groups and use it in ave
and get length
of each group and assign 1 to groups which has length greater than equal to 3. 我们可以使用rleid
的data.table
来创建组,并在ave
使用它并获取每个组的length
,并将1分配给长度大于3的组。
library(data.table)
df$Y <- as.integer(ave(df$X, rleid(df$X > 0), FUN = length) >= 3)
df
# ID X Y
#1 1 1 1
#2 2 2 1
#3 3 1 1
#4 4 0 0
#5 5 0 0
#6 6 1 1
#7 7 4 1
#8 8 5 1
#9 9 6 1
#10 10 7 1
#11 11 0 0
#12 12 0 0
EDIT 编辑
For updated post we could include the above data.table
part with dplyr
by doing 对于更新的帖子,我们可以通过将上面的data.table
部分与dplyr
包括
library(dplyr)
library(data.table)
df %>%
group_by(group = rleid(X > 0)) %>%
mutate(Y = ifelse(n() >= 3 & row_number() == 1, 1, 0)) %>%
ungroup() %>%
mutate(Y = cumsum(Y) * Y) %>%
group_by(group) %>%
mutate(Y = first(Y)) %>%
ungroup() %>%
select(-group)
# ID X Y
# <int> <int> <dbl>
# 1 1 1 1
# 2 2 2 1
# 3 3 1 1
# 4 4 0 0
# 5 5 0 0
# 6 6 1 2
# 7 7 4 2
# 8 8 5 2
# 9 9 6 2
#10 10 7 2
#11 11 0 0
#12 12 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.