简体   繁体   English

小组之间的dplyr滞后

[英]dplyr lag across groups

I am trying to do something like a lag, but across and not within groups. 我正在尝试做类似滞后的事情,但要跨群体而不是在团队内部。 Sample data: 样本数据:

df <- data.frame(flag = c("A", "B", "A", "B", "B", "B", "A", "B", "B", "A", "B"),
                 var = c("AB123","AC124", "AD125", "AE126",
                          "AF127", "AG128", "AF129",
                          "AG130","AH131",
                          "AHI132", "AJ133"))
)

The goal for every flag="B" is to create lagvar with the previous var value where flag="A". 每个flag =“ B”的目标是使用先前的var值(其中flag =“ A”)创建lagvar。

This will show the desired output: 这将显示所需的输出:

df1 <- data.frame(flag = c("A", "B", "A", "B", "B", "B", "A", "B", "B", "A", "B"),
                 var = c("AB123","AC124", "AD125", "AE126",
                          "AF127", "AG128", "AF129",
                          "AG130","AH131",
                          "AHI132", "AJ133"),
                 lagvar = c("","AB123","","AD125","AD125","AD125","","AF129","AF129","","AHI132")
)

A dplyr solution is preferred, but I'm not picky! 首选dplyr解决方案,但我并不挑剔!

EDIT: I found a solution using the zoo package but am interested if others have better ideas. 编辑:我找到了使用zoo软件包的解决方案,但对其他人是否有更好的主意很感兴趣。 df$lagvar <- ifelse(df$flag == "A", df$var, NA)
df <- df %>% mutate(lagvar = na.locf(lagvar)

Here you go. 干得好。 I used NA instead of blanks, but you can adjust as needed: 我使用NA而不是空格,但是您可以根据需要进行调整:

df %>% mutate(lagvar = ifelse(flag == "A", as.character(var), NA),
              lagvar = zoo::na.locf(lagvar),
              lagvar = ifelse(flag == "A", NA, lagvar))
#    flag    var lagvar
# 1     A  AB123   <NA>
# 2     B  AC124  AB123
# 3     A  AD125   <NA>
# 4     B  AE126  AD125
# 5     B  AF127  AD125
# 6     B  AG128  AD125
# 7     A  AF129   <NA>
# 8     B  AG130  AF129
# 9     B  AH131  AF129
# 10    A AHI132   <NA>
# 11    B  AJ133 AHI132

My solution is a bit complicated. 我的解决方案有点复杂。 The idea is to find out the position of A each B should assign to and then join with a table, which only contains rows with flag A. 这个想法是找出每个B应该分配给A的位置,然后与一个表联接,该表仅包含带有标志A的行。

df %>%
  mutate(pos=cumsum(flag == "A")) %>%
  left_join(
    df %>%
      filter(flag == "A") %>%
      mutate(pos=1:n()) %>%
      select(pos, lagvar=var),
    by="pos") %>%
  mutate(lagvar=ifelse(flag == "A", "", as.character(lagvar)))

#    flag    var pos lagvar
# 1     A  AB123   1       
# 2     B  AC124   1  AB123
# 3     A  AD125   2       
# 4     B  AE126   2  AD125
# 5     B  AF127   2  AD125
# 6     B  AG128   2  AD125
# 7     A  AF129   3       
# 8     B  AG130   3  AF129
# 9     B  AH131   3  AF129
# 10    A AHI132   4       
# 11    B  AJ133   4 AHI132

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM