简体   繁体   English

如何在R中以字符形式构建条件滚动序列

[英]how to build conditional rolling sequence in character form in R

I have a data frame with 4 columns我有一个 4 列的数据框

g represents some group id in the data g表示数据中的某个组 ID

x represents some character type id on which I have to perform computation x表示我必须对其执行计算的某些字符类型 id

date is date of action, to be used for some other analysis. date是行动日期,用于其他一些分析。 (however, I have retained the same so that the order of records may be preserved) (但是,我保留了相同的内容,以便保留记录的顺序)

action is action to be performed. action是要执行的动作。 This column has two values 'add' and 'reduce'.此列有两个值“添加”和“减少”。 These are equal in number for each group g .每个组g数量相等。 Means if there are 5 records having 'add' then there will be 5 'reduce' in the same group.意味着如果有 5 个记录具有“添加”,那么同一组中将有 5 个“减少”。 recorded as an ordered factor in data frame.记录为数据帧中的有序因子。

Basically, I have to create a new column by the following rules-基本上,我必须按照以下规则创建一个新列-

  • Since first record in each group will always be 'add' therefore exact value of x may be used here由于每个组中的第一条记录将始终是“添加”,因此可以在此处使用x确切值

  • In second and onward record, in each group, the value of x may be pasted to previous record given that 'action' value is 'add'在第二个和以后的记录中,在每组中,假设 'action' 值为 'add',则x的值可以粘贴到前一个记录中

  • In second and onward record, in each group, the value of x may be removed from previous record given that 'action' value is 'reduce'在第二个和以后的记录中,在每组中,假设“action”值为“reduce”,则可以从前一个记录中删除x的值

dput of sample data is样本数据的dput

df <- structure(list(g = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 
8L, 8L), x = c("1_", "1_", "2_", "2_", "2_", "1_", "2_", "1_", 
"1_", "1_", "1_", "2_", "2_", "2_", "1_", "2_", "3_", "3_", "3_", 
"3_", "4_", "4_", "3_", "3_", "3_", "3_", "3_", "3_", "4_", "5_", 
"4_", "5_", "6_", "6_", "6_", "6_", "7_", "7_", "7_", "7_", "8_", 
"8_", "8_", "8_"), date = structure(c(18262, 18264, 18265, 18266, 
18271, 18275, 18275, 18276, 18277, 18279, 18280, 18283, 18286, 
18287, 18288, 18291, 18262, 18264, 18275, 18276, 18277, 18288, 
18275, 18283, 18291, 18297, 18301, 18309, 18366, 18374, 18375, 
18381, 18309, 18319, 18328, 18347, 18364, 18367, 18303, 18309, 
18328, 18341, 18341, 18344), class = "Date"), action = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 
1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 
1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("add", 
"reduce"), class = c("ordered", "factor"))), row.names = c(NA, 
-44L), groups = structure(list(g = 1:8, .rows = structure(list(
    1:16, 17:22, 23:28, 29:32, 33:36, 37:38, 39:40, 41:44), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, -8L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

df
# A tibble: 44 x 4
# Groups:   g [8]
       g x     date       action
   <int> <chr> <date>     <ord> 
 1     1 1_    2020-01-01 add   
 2     1 1_    2020-01-03 add   
 3     1 2_    2020-01-04 add   
 4     1 2_    2020-01-05 add   
 5     1 2_    2020-01-10 reduce
 6     1 1_    2020-01-14 reduce
 7     1 2_    2020-01-14 reduce
 8     1 1_    2020-01-15 reduce
 9     1 1_    2020-01-16 add   
10     1 1_    2020-01-18 add   
# ... with 34 more rows

desired result may be like - desired result可能是——

head(as_tibble(df3_r),12)
# A tibble: 12 x 6
       X     g x     date       action result  
   <int> <int> <chr> <chr>      <chr>  <chr>   
 1     1     1 1_    01-01-2020 add    1_      
 2     2     1 1_    03-01-2020 add    1_1_    
 3     3     1 2_    04-01-2020 add    1_1_2_  
 4     4     1 2_    05-01-2020 add    1_1_2_2_
 5     5     1 2_    10-01-2020 reduce 1_1_2_  
 6     6     1 1_    14-01-2020 reduce 1_2_    
 7     7     1 2_    14-01-2020 reduce 1_      
 8     8     1 1_    15-01-2020 reduce NA      
 9     9     1 1_    16-01-2020 add    1_      
10    10     1 1_    18-01-2020 add    1_1_    
11    11     1 1_    19-01-2020 reduce 1_      
12    12     1 2_    22-01-2020 add    1_2_ 

Thanks in advance.提前致谢。

You can use accumulate2 from the purrr package.您可以使用accumulate2purrr包。

library(dplyr)
library(purrr)
df %>% 
  mutate(result = unlist(accumulate2(
    x, action[-1L], ~if (..3 == "add") paste0(., .y) else sub(.y, "", .)
  )))

Output输出

   g  x       date action   result
1  1 1_ 2020-01-01    add       1_
2  1 1_ 2020-01-03    add     1_1_
3  1 2_ 2020-01-04    add   1_1_2_
4  1 2_ 2020-01-05    add 1_1_2_2_
5  1 2_ 2020-01-10 reduce   1_1_2_
6  1 1_ 2020-01-14 reduce     1_2_
7  1 2_ 2020-01-14 reduce       1_
8  1 1_ 2020-01-15 reduce         
9  1 1_ 2020-01-16    add       1_
10 1 1_ 2020-01-18    add     1_1_
11 1 1_ 2020-01-19 reduce       1_
12 1 2_ 2020-01-22    add     1_2_
13 1 2_ 2020-01-25 reduce       1_
14 1 2_ 2020-01-26    add     1_2_
15 1 1_ 2020-01-27 reduce       2_
16 1 2_ 2020-01-30 reduce         
17 2 3_ 2020-01-01    add       3_
18 2 3_ 2020-01-03    add     3_3_
19 2 3_ 2020-01-14 reduce       3_
20 2 3_ 2020-01-15 reduce         
21 2 4_ 2020-01-16    add       4_
22 2 4_ 2020-01-27 reduce         
23 3 3_ 2020-01-14    add       3_
24 3 3_ 2020-01-22 reduce         
25 3 3_ 2020-01-30    add       3_
26 3 3_ 2020-02-05 reduce         
27 3 3_ 2020-02-09    add       3_
28 3 3_ 2020-02-17 reduce         
29 4 4_ 2020-04-14    add       4_
30 4 5_ 2020-04-22    add     4_5_
31 4 4_ 2020-04-23 reduce       5_
32 4 5_ 2020-04-29 reduce         
33 5 6_ 2020-02-17    add       6_
34 5 6_ 2020-02-27    add     6_6_
35 5 6_ 2020-03-07 reduce       6_
36 5 6_ 2020-03-26 reduce         
37 6 7_ 2020-04-12    add       7_
38 6 7_ 2020-04-15 reduce         
39 7 7_ 2020-02-11    add       7_
40 7 7_ 2020-02-17 reduce         
41 8 8_ 2020-03-07    add       8_
42 8 8_ 2020-03-20 reduce         
43 8 8_ 2020-03-20    add       8_
44 8 8_ 2020-03-23 reduce         

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM