简体   繁体   English

如果值出现在特定行中,则返回 3 个下一行和上一行

[英]Return 3 next and previous rows if value occurs in particular row

I have data frame like this:我有这样的数据框:

    Input = (" v1 v2
          1 A1 0
          2 B1 0
          3 C1 0
          4 D1 1
          5 E1 0
          6 F1 0
          7 G1 0
          8 H1 0
          9 I1 0
          10 J1 0
          11 K1 0
          12 A2 1
          13 B2 0
          14 C2 0
          15 D2 0
          16 E2 0
          17 F2 0
          18 G2 0
          19 H2 0
          20 I2 0
          21 J2 0
          22 K2 0
           ")
df = as.data.frame(read.table(textConnection(Input), header = T, row.names=1))

And I'd like to keep only rows with 1 in v2 and 3 previous and next rows around each 1, so desired output is:我想在v2只保留 1 行,每个 1 前后各有 3 行,因此所需的输出是:

v1 v2
A1 0
B1 0
C1 0
D1 1
E1 0
F1 0
G1 0
I1 0
J1 0
K1 0
A2 1
B2 0
C2 0
D2 0

So we have all 1-rows (in this case 2) and 6 corresponding neighbor rows (3 lower, 3 upper).所以我们有所有 1 行(在本例中为 2)和 6 个对应的相邻行(3 个下,3 个上)。 In orginal dataset I have 100k+ rows and only several 1-rows spreaded in whole dataset.在原始数据集中,我有 100k+ 行,并且只有几个 1 行分布在整个数据集中。

I tried to do this with simple ifelse() in apply for prevs and next rows separately and then combine everything together but it doesn't work.我尝试使用简单的ifelse()来分别apply一行和下一行,然后将所有内容组合在一起,但它不起作用。

prev <- as.data.frame(apply(df, 1, function(x) ifelse(x[1]==1,x-1:3,0)))
next <- as.data.frame(apply(df, 1, function(x) ifelse(x[1]==1,x+1:3,0)))

I was thinking to use lag() and lead() but I don't know how to lag or lead n=3 rows only around with 1 in v2 .我正在考虑使用lag()lead()但我不知道如何滞后或领先n=3行,仅在v2有 1 行。 Could you please help me out?你能帮我一下吗?

One possible solution (maybe a bit lengthy but very interesting for other purposes as well) is to create multiple lags and leads of the variable of interest and then filter for any variable that has value equal to 1.一种可能的解决方案(可能有点冗长,但对于其他目的也很有趣)是创建感兴趣变量的多个滞后和超前,然后过滤任何值等于 1 的变量。

We first create two functions that produce n lags and n leads, respectively, starting from a dataframe:我们首先创建两个函数,分别从数据帧开始产生 n 个滞后和 n 个领先:

lags <- function(data, variable, n){
  require(dplyr)
  require(purrr)
  
  variable <- enquo(variable)
  
  indices <- seq_len(n)
  quosures <- map(indices, ~quo(lag(!!variable, !!.x))) %>% 
    set_names(sprintf("lag_%02d", indices))
  
  mutate(data, !!!quosures)
}



leads <- function(data, variable, n){
  require(dplyr)
  require(purrr)
  
  variable <- enquo(variable)
  
  indices <- seq_len(n)
  quosures <- map(indices, ~quo(lead(!!variable, !!.x))) %>% 
    set_names(sprintf("lead_%02d", indices))
  
  mutate(data, !!!quosures)
}

Then we apply them to our dataframe and filter the observations that contains a 1:然后我们将它们应用于我们的数据框并过滤包含 1 的观察结果:

library(dplyr)

df %>% 
  lags(v2, n = 3) %>% 
  leads(v2, n = 3) %>% 
  filter_all(any_vars(. == 1)) %>% 
  select(v1, v2)

#    v1 v2
# 1  A1  0
# 2  B1  0
# 3  C1  0
# 4  D1  1
# 5  E1  0
# 6  F1  0
# 7  G1  0
# 8  I1  0
# 9  J1  0
# 10 K1  0
# 11 A2  1
# 12 B2  0
# 13 C2  0
# 14 D2  0

We can find out indices where v2 = 1 occurs and use sapply to generate row numbers -3 to +3 of each index.我们可以找出出现v2 = 1索引,并使用sapply生成每个索引的行号 -3 到 +3。

#get row index where v2 = 1
inds <- which(df$v2 == 1)
#unique to remove overlapping row index
inds2 <- unique(c(sapply(inds, `+`, -3:3)))
#remove negative values or values which are greater than number of rows in df
inds2 <- inds2[inds2 > 0 & inds2 <= nrow(df)]
#select rows.
df[inds2, ]

#   v1 v2
#1  A1  0
#2  B1  0
#3  C1  0
#4  D1  1
#5  E1  0
#6  F1  0
#7  G1  0
#9  I1  0
#10 J1  0
#11 K1  0
#12 A2  1
#13 B2  0
#14 C2  0
#15 D2  0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM