[英]R new variable by group looped on multiple lagged and lead values
可以说我有三个变量id
, date
和trad
(它具有3个值,在任何时间点都可以是任何一个):
library(tidyverse)
dput(df)
structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 2, 2, 2), date = structure(c(16436, 16437, 16438, 16439,
16440, 16441, 16442, 16443, 16444, 16445, 16446, 16447, 16448,
16449, 16450, 16451, 16452, 16453, 16454), class = "Date"), trad = c("Free",
"Suspended", "Suspended", "Free", "Suspended", "Withdrawn", "Withdrawn",
"Free", "Withdrawn", "Free", "Free", "Withdrawn", "Suspended",
"Withdrawn", "Withdrawn", "Free", "Withdrawn", "Suspended", "Free"
)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA,
-19L), spec = structure(list(cols = list(id = structure(list(), class = c("collector_double",
"collector")), date = structure(list(format = "%d/%m/%Y"), class = c("collector_date",
"collector")), trad = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
df
# A tibble: 19 x 3
id date trad
<dbl> <date> <chr>
1 1 2015-01-01 Free
2 1 2015-01-02 Suspended
3 1 2015-01-03 Suspended
4 1 2015-01-04 Free
5 1 2015-01-05 Suspended
6 1 2015-01-06 Withdrawn
7 1 2015-01-07 Withdrawn
8 1 2015-01-08 Free
9 1 2015-01-09 Withdrawn
10 1 2015-01-10 Free
11 1 2015-01-11 Free
12 1 2015-01-12 Withdrawn
13 1 2015-01-13 Suspended
14 1 2015-01-14 Withdrawn
15 1 2015-01-15 Withdrawn
16 1 2015-01-16 Free
17 2 2015-01-17 Withdrawn
18 2 2015-01-18 Suspended
19 2 2015-01-19 Free
我想使用期间开始的开始日期和结束日期来生成新列。 当trad
移到警告状态为"Withdrawn"
时,句点开始,如果"Withdrawn"
行之前有状态"Suspended"
,则开始日期移至该行。 如果在"Suspended"
"Withdrawn"
"Suspended"
之前有多行"Suspended"
,则从第一个"Suspended"
。 同样,结束日期是trad
进入"Withdrawn"
后变为Free
日期。 这是必需的最终数据集:
dfnew
# A tibble: 19 x 6
id date trad start end period
<dbl> <date> <chr> <date> <date> <dbl>
1 1 2015-01-01 Free NA NA NA
2 1 2015-01-02 Suspended NA NA NA
3 1 2015-01-03 Suspended NA NA NA
4 1 2015-01-04 Free NA NA NA
5 1 2015-01-05 Suspended 2015-01-05 NA 1
6 1 2015-01-06 Withdrawn NA NA 1
7 1 2015-01-07 Withdrawn NA NA 1
8 1 2015-01-08 Free NA 2015-01-08 1
9 1 2015-01-09 Withdrawn 2015-01-09 NA 2
10 1 2015-01-10 Free NA 2015-01-10 2
11 1 2015-01-11 Free NA NA NA
12 1 2015-01-12 Withdrawn 2015-01-12 NA 3
13 1 2015-01-13 Suspended NA NA 3
14 1 2015-01-14 Withdrawn NA NA 3
15 1 2015-01-15 Withdrawn NA NA 3
16 1 2015-01-16 Free NA 2015-01-16 NA
17 2 2015-01-17 Withdrawn 2015-01-17 NA 1
18 2 2015-01-18 Suspended NA NA 1
19 2 2015-01-19 Free NA 2015-01-19 1
有在任何模式trad
,所以你可以有任何序列"Withdrawn"
/ "Suspended"
之前, "Free"
这样一个解决方案这样的事情不工作(在理论上可以,但我需要太多的条件来实现它):
dfnew <- df %>%
group_by(id)
mutate(start = ifelse(trad == "Withdrawn" & lag(trad == "Free"), date, NA))
这些问题很有帮助,但不能回答以下问题:
有人会提供灵活的解决方案吗?
不是很灵活,但至少可以尝试一下。
我不知道当我们对“暂停”,“暂停”,“撤回”,“撤回”进行排序时会发生什么。
例如,将2015年1月4日的交易记录更改为“已暂停”。 在这种情况下,开始时间是几点? 我提供了2个解决方案,第一个解决方案的开始日期是2015年1月2日,第二个解决方案的日期是2015年1月5日
dfnew1 <- df %>%
mutate(startGroups = cumsum(trad == "Free")) %>%
group_by(startGroups) %>% # make a group from every occurance of "Free" in trad
mutate(wds = cumsum(trad == "Withdrawn"),
start = ifelse(max(wds) > 0 & row_number() == 2, date, NA) # if there is any "Withdrawn" in the group set start date right after "Free"
) %>%
ungroup() %>%
mutate(endGroups = cumsum(!is.na(start))) %>%
group_by(endGroups) %>% # group on every open trade now
mutate(frees = cumsum(trad == "Free"),
end = ifelse(frees == 1 & endGroups > 0, date, NA) #end on first occurance of "Free" in trad column
) # %>% select(-startGroups, wds, endGroups, frees) # remove cols
dfnew2 <- df %>%
mutate(startGroups = cumsum(trad == "Free")) %>%
group_by(startGroups) %>% # make a group from every occurance of "Free" in trad
mutate(wds = cumsum(trad == "Withdrawn"),
start = ifelse(
(trad == "Suspended" & lead(trad) == "Withdrawn" & lead(wds) == 1 |
trad == "Withdrawn" & lag(trad) != "Suspended" & wds == 1),
date, NA) # first trad in group. Other option:
) %>%
ungroup() %>%
mutate(endGroups = cumsum(!is.na(start))) %>%
group_by(endGroups) %>%
mutate(frees = cumsum(trad == "Free"),
end = ifelse(frees == 1 & endGroups > 0, date, NA)
) #%>% select(-startGroups, wds, endGroups, frees)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.