繁体   English   中英

R新组变量在多个滞后和超前值上循环

[英]R new variable by group looped on multiple lagged and lead values

可以说我有三个变量iddatetrad (它具有3个值,在任何时间点都可以是任何一个):

library(tidyverse) 
dput(df)
    structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 2, 2, 2), date = structure(c(16436, 16437, 16438, 16439, 
    16440, 16441, 16442, 16443, 16444, 16445, 16446, 16447, 16448, 
    16449, 16450, 16451, 16452, 16453, 16454), class = "Date"), trad = c("Free", 
    "Suspended", "Suspended", "Free", "Suspended", "Withdrawn", "Withdrawn", 
    "Free", "Withdrawn", "Free", "Free", "Withdrawn", "Suspended", 
    "Withdrawn", "Withdrawn", "Free", "Withdrawn", "Suspended", "Free"
    )), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, 
    -19L), spec = structure(list(cols = list(id = structure(list(), class = c("collector_double", 
    "collector")), date = structure(list(format = "%d/%m/%Y"), class = c("collector_date", 
    "collector")), trad = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))
    df
    # A tibble: 19 x 3
          id date       trad     
       <dbl> <date>     <chr>    
     1     1 2015-01-01 Free     
     2     1 2015-01-02 Suspended
     3     1 2015-01-03 Suspended
     4     1 2015-01-04 Free     
     5     1 2015-01-05 Suspended
     6     1 2015-01-06 Withdrawn
     7     1 2015-01-07 Withdrawn
     8     1 2015-01-08 Free     
     9     1 2015-01-09 Withdrawn
    10     1 2015-01-10 Free     
    11     1 2015-01-11 Free     
    12     1 2015-01-12 Withdrawn
    13     1 2015-01-13 Suspended
    14     1 2015-01-14 Withdrawn
    15     1 2015-01-15 Withdrawn
    16     1 2015-01-16 Free     
    17     2 2015-01-17 Withdrawn
    18     2 2015-01-18 Suspended
    19     2 2015-01-19 Free 

我想使用期间开始的开始日期和结束日期来生成新列。 trad移到警告状态为"Withdrawn"时,句点开始,如果"Withdrawn"行之前有状态"Suspended" ,则开始日期移至该行。 如果在"Suspended" "Withdrawn" "Suspended"之前有多行"Suspended" ,则从第一个"Suspended" 同样,结束日期是trad进入"Withdrawn"后变为Free日期。 这是必需的最终数据集:

dfnew
# A tibble: 19 x 6
      id date       trad      start      end        period
   <dbl> <date>     <chr>     <date>     <date>      <dbl>
 1     1 2015-01-01 Free      NA         NA             NA
 2     1 2015-01-02 Suspended NA         NA             NA
 3     1 2015-01-03 Suspended NA         NA             NA
 4     1 2015-01-04 Free      NA         NA             NA
 5     1 2015-01-05 Suspended 2015-01-05 NA              1
 6     1 2015-01-06 Withdrawn NA         NA              1
 7     1 2015-01-07 Withdrawn NA         NA              1
 8     1 2015-01-08 Free      NA         2015-01-08      1
 9     1 2015-01-09 Withdrawn 2015-01-09 NA              2
10     1 2015-01-10 Free      NA         2015-01-10      2
11     1 2015-01-11 Free      NA         NA             NA
12     1 2015-01-12 Withdrawn 2015-01-12 NA              3
13     1 2015-01-13 Suspended NA         NA              3
14     1 2015-01-14 Withdrawn NA         NA              3
15     1 2015-01-15 Withdrawn NA         NA              3
16     1 2015-01-16 Free      NA         2015-01-16     NA
17     2 2015-01-17 Withdrawn 2015-01-17 NA              1
18     2 2015-01-18 Suspended NA         NA              1
19     2 2015-01-19 Free      NA         2015-01-19      1 

有在任何模式trad ,所以你可以有任何序列"Withdrawn" / "Suspended"之前, "Free"这样一个解决方案这样的事情不工作(在理论上可以,但我需要太多的条件来实现它):

dfnew <- df %>% 
  group_by(id)
  mutate(start = ifelse(trad == "Withdrawn" & lag(trad == "Free"), date, NA))

这些问题很有帮助,但不能回答以下问题:

如何提取某些列值不能为特定值的前n行?

R-有条件的滞后-如何滞后一定数量的细胞直到满足条件?

有人会提供灵活的解决方案吗?

不是很灵活,但至少可以尝试一下。

我不知道当我们对“暂停”,“暂停”,“撤回”,“撤回”进行排序时会发生什么。

例如,将2015年1月4日的交易记录更改为“已暂停”。 在这种情况下,开始时间是几点? 我提供了2个解决方案,第一个解决方案的开始日期是2015年1月2日,第二个解决方案的日期是2015年1月5日

dfnew1 <- df %>% 
    mutate(startGroups = cumsum(trad == "Free")) %>% 
    group_by(startGroups) %>% # make a group from every occurance of "Free" in trad
    mutate(wds = cumsum(trad == "Withdrawn"),
           start = ifelse(max(wds) > 0 & row_number() == 2, date, NA) # if there is any "Withdrawn" in the group set start date right after "Free" 
           ) %>% 
    ungroup() %>% 
    mutate(endGroups = cumsum(!is.na(start))) %>% 
    group_by(endGroups) %>% # group on every open trade now
    mutate(frees = cumsum(trad == "Free"),
           end = ifelse(frees == 1 & endGroups > 0, date, NA) #end on first occurance of "Free" in trad column
           ) # %>% select(-startGroups, wds, endGroups, frees) # remove cols

dfnew2 <- df %>% 
    mutate(startGroups = cumsum(trad == "Free")) %>% 
    group_by(startGroups) %>% # make a group from every occurance of "Free" in trad
    mutate(wds = cumsum(trad == "Withdrawn"),
           start = ifelse(
                        (trad == "Suspended" & lead(trad) == "Withdrawn" & lead(wds) == 1 |
                            trad == "Withdrawn" & lag(trad) != "Suspended" & wds == 1), 
                       date, NA) # first trad in group. Other option: 
    ) %>% 
    ungroup() %>% 
    mutate(endGroups = cumsum(!is.na(start))) %>% 
    group_by(endGroups) %>% 
    mutate(frees = cumsum(trad == "Free"),
           end = ifelse(frees == 1 & endGroups > 0, date, NA)
    )  #%>% select(-startGroups, wds, endGroups, frees)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM