[英]R- Iterating through each group and dynamically assigning values
I have the following dataset:我有以下数据集:
id ![]() |
row_name![]() |
start_date![]() |
end_date![]() |
rows_overlap_period ![]() |
---|---|---|---|---|
person_1![]() |
1 ![]() |
2010-04-23 ![]() |
2010-06-22 ![]() |
2,3,4,5,6 ![]() |
person_1![]() |
2 ![]() |
2010-04-25 ![]() |
2010-06-24 ![]() |
3,4,5,6 ![]() |
person_1![]() |
3 ![]() |
2010-04-27 ![]() |
2010-06-26 ![]() |
4,5,6,7 ![]() |
person_1![]() |
4 ![]() |
2010-04-29 ![]() |
2010-06-28 ![]() |
5,6,7,8 ![]() |
person_1![]() |
5 ![]() |
2010-04-30 ![]() |
2010-06-29 ![]() |
6,7,8 ![]() |
person_1![]() |
6 ![]() |
2010-05-08 ![]() |
2010-07-07 ![]() |
7,8 ![]() |
person_1![]() |
7 ![]() |
2010-06-26 ![]() |
2010-08-25 ![]() |
8 ![]() |
person_1![]() |
8 ![]() |
2010-06-28 ![]() |
2010-08-27 ![]() |
|
person_2![]() |
9 ![]() |
2010-07-30 ![]() |
2010-09-28 ![]() |
10 ![]() |
person_2![]() |
10 ![]() |
2010-08-02 ![]() |
2010-10-01 ![]() |
The "rows_overlap_period" column indicates which other records started between the 'start_date' and 'end_date' period. “rows_overlap_period”列指示哪些其他记录在“start_date”和“end_date”期间开始。
However, I would iterate within each group to arrive at the following result:但是,我会在每个组内进行迭代以得出以下结果:
id ![]() |
row_name![]() |
start_date![]() |
end_date![]() |
rows_overlap_period ![]() |
---|---|---|---|---|
person_1![]() |
1 ![]() |
2010-04-23 ![]() |
2010-06-22 ![]() |
2,3,4,5,6 ![]() |
person_1![]() |
2 ![]() |
2010-04-25 ![]() |
2010-06-24 ![]() |
|
person_1![]() |
3 ![]() |
2010-04-27 ![]() |
2010-06-26 ![]() |
|
person_1![]() |
4 ![]() |
2010-04-29 ![]() |
2010-06-28 ![]() |
|
person_1![]() |
5 ![]() |
2010-04-30 ![]() |
2010-06-29 ![]() |
|
person_1![]() |
6 ![]() |
2010-05-08 ![]() |
2010-07-07 ![]() |
|
person_1![]() |
7 ![]() |
2010-06-26 ![]() |
2010-08-25 ![]() |
8 ![]() |
person_1![]() |
8 ![]() |
2010-06-28 ![]() |
2010-08-27 ![]() |
|
person_2![]() |
9 ![]() |
2010-07-30 ![]() |
2010-09-28 ![]() |
10 ![]() |
person_2![]() |
10 ![]() |
2010-08-02 ![]() |
2010-10-01 ![]() |
This "output" would be the result of the 'following algorithm':这个“输出”将是“以下算法”的结果:
For each group:对于每个组:
Reproducible example (what I got so far):可重现的例子(我到目前为止得到的):
# Input data
data.frame(id = c("person_1", "person_1", "person_1", "person_1", "person_1",
"person_1", "person_1", "person_1", "person_2",
"person_2"),
row_name = rep(1:10),
start_date = as.Date(c("2010-04-23", "2010-04-25", "2010-04-27",
"2010-04-29", "2010-04-30", "2010-05-08",
"2010-06-26", "2010-06-28", "2010-07-30",
"2010-08-02")),
end_date = as.Date(c("2010-06-22", "2010-06-24", "2010-06-26",
"2010-06-28", "2010-06-29", "2010-07-07",
"2010-08-25", "2010-08-27", "2010-09-28",
"2010-10-01"))) -> data
# Find overlaps (column rows_overlap_period)
sqldf::sqldf("select a.*,
coalesce(group_concat(b.row_name), ' ') as rows_overlap_period
from data a
left join data b on
a.id = b.id and
not a.row_name = b.row_name and
(b.start_date between
a.start_date and a.end_date)
group by a.rowid
order by a.rowid") -> data
I was really trying to find some solution using dplyr, data.table or sqldf directly, but I can't find ways not to implement 'loops within loops' - which would degrade performance a lot.我真的试图直接使用 dplyr、data.table 或 sqldf 找到一些解决方案,但我找不到不实现“循环内循环”的方法——这会大大降低性能。
Does anyone have any suggestions on how I can reach this?有人对我如何达到这个目标有任何建议吗?
We could create a grouping column to do this in addition to the 'id' column除了“id”列之外,我们还可以创建一个分组列来执行此操作
library(dplyr)
data %>%
group_by(id) %>%
mutate(grp = cumsum(lead(!nzchar(trimws(rows_overlap_period)),
default = FALSE))) %>%
group_by(grp, .add = TRUE) %>%
mutate(rows_overlap_period = case_when(row_number() ==1 ~
rows_overlap_period, TRUE ~ "")) %>%
ungroup %>%
select(-grp)
-output -输出
# A tibble: 10 × 5
id row_name start_date end_date rows_overlap_period
<chr> <int> <date> <date> <chr>
1 person_1 1 2010-04-23 2010-06-22 "2,3,4,5,6"
2 person_1 2 2010-04-25 2010-06-24 ""
3 person_1 3 2010-04-27 2010-06-26 ""
4 person_1 4 2010-04-29 2010-06-28 ""
5 person_1 5 2010-04-30 2010-06-29 ""
6 person_1 6 2010-05-08 2010-07-07 ""
7 person_1 7 2010-06-26 2010-08-25 "8"
8 person_1 8 2010-06-28 2010-08-27 ""
9 person_2 9 2010-07-30 2010-09-28 "10"
10 person_2 10 2010-08-02 2010-10-01 ""
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.