简体   繁体   English

在连续的行中查找值

[英]Finding values in consecutive rows

An example of the dataframe I have is given below. 下面是我拥有的数据框的示例。

ID     X      
1      1     
2      2      
3      1      
4      0      
5      0      
6      1      
7      4
8      5 
9      6
10     7
11     0 
12     0

I want to apply logic to it that looks to see whether 3 or more consecutive rows have a value >0 in it. 我想对其应用逻辑,以查看是否有3个或更多连续行的值大于0。 If they do I want to flag them in another column. 如果他们这样做,我想在另一列中对其进行标记。 Hence the output will look as follows. 因此,输出将如下所示。

ID     X      Y
1      1      1
2      2      1
3      1      1
4      0      0
5      0      0
6      1      1
7      4      1
8      5      1
9      6      1
10     7      1
11     0      0
12     0      0

EXTENSION - How would I get the following output, givibng a different Y value for each group? 扩展-如何为每个组赋予不同的Y值以获得以下输出?

ID     X      Y
1      1      1
2      2      1
3      1      1
4      0      0
5      0      0
6      1      2
7      4      2
8      5      2
9      6      2
10     7      2
11     0      0
12     0      0

One option with base R . base Rbase R一种选择。 Using rle to find the adjacent values in 'X' that are greater than 0, then do the rep lication based on the lengths 使用rle来查找“X”是大于0的相邻值,然后执行rep基于所述lication lengths

df1$Y <- with(rle(df1$X > 0), as.integer(rep(values & lengths > 2, lengths)))
df1$Y
#[1] 1 1 1 0 0 1 1 1 1 1 0 0

For the updated case in the OP's post 对于OP帖子中的更新案例

df1$Y <- inverse.rle(within.list(rle(df1$X > 0), {
             i1 <- values & (lengths > 2)
      values[i1] <- seq_along(values[i1])}))
df1$Y
#[1] 1 1 1 0 0 2 2 2 2 2 0 0

Or using rleid from data.table 或使用rleiddata.table

library(data.table)
setDT(df1)[, Y := as.integer((.N > 2) * (X > 0)),rleid(X > 0)]

data 数据

df1 <- structure(list(ID = 1:12, X = c(1L, 2L, 1L, 0L, 0L, 1L, 4L, 5L, 
 6L, 7L, 0L, 0L)), class = "data.frame", row.names = c(NA, -12L
 ))

We can use rleid from data.table to create groups and use it in ave and get length of each group and assign 1 to groups which has length greater than equal to 3. 我们可以使用rleiddata.table来创建组,并在ave使用它并获取每个组的length ,并将1分配给长度大于3的组。

library(data.table)
df$Y <- as.integer(ave(df$X, rleid(df$X > 0), FUN = length) >= 3)

df
#   ID X Y
#1   1 1 1
#2   2 2 1
#3   3 1 1
#4   4 0 0
#5   5 0 0
#6   6 1 1
#7   7 4 1
#8   8 5 1
#9   9 6 1
#10 10 7 1
#11 11 0 0
#12 12 0 0

EDIT 编辑

For updated post we could include the above data.table part with dplyr by doing 对于更新的帖子,我们可以通过将上面的data.table部分与dplyr包括

library(dplyr)
library(data.table)

df %>%
  group_by(group = rleid(X > 0)) %>%
  mutate(Y = ifelse(n() >= 3 & row_number() == 1, 1, 0)) %>%
  ungroup() %>%
  mutate(Y = cumsum(Y) * Y) %>%
  group_by(group) %>%
  mutate(Y = first(Y)) %>%
  ungroup() %>%
  select(-group)


#     ID     X     Y
#   <int> <int> <dbl>
# 1     1     1     1
# 2     2     2     1
# 3     3     1     1
# 4     4     0     0
# 5     5     0     0
# 6     6     1     2
# 7     7     4     2
# 8     8     5     2
# 9     9     6     2
#10    10     7     2
#11    11     0     0
#12    12     0     0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM