简体   繁体   English

R:使用 group_by 加入列表并汇总

[英]R: Join lists with group_by and summarize

let's say I have multiple lines within my callcenter.假设我的呼叫中心内有多条线路。 For each line I know the timestamps a call was taken (t_Calltaken) and when it was finished (t_Hangup).对于每一行,我都知道通话的时间戳(t_Calltaken)和完成的时间(t_Hangup)。

library(tidyverse)
library(lubridate)

Callcenter <- data.frame(line=c("A","B","C","D","A","B","D","A","C","D"),
           t_Calltaken=c("2019-01-01 00:10:50", "2019-01-01 00:12:30","2019-01-01 00:17:00","2019-01-01 00:20:50","2019-01-01 00:35:20","2019-01-01 00:42:50","2019-01-01 00:48:50","2019-01-01 01:03:20","2019-01-01 01:10:50","2019-01-01 01:23:50"),
           t_Hangup=c("2019-01-01 00:33:10", "2019-01-01 00:35:10","2019-01-01 01:07:33","2019-01-01 00:38:50","2019-01-01 00:49:27","2019-01-01 01:22:40","2019-01-01 01:10:41","2019-01-01 01:26:10","2019-01-01 01:47:44","2019-01-01 01:51:15"))

I now want to analyze the maximum number of occupied lines at the same time for a year.我现在想分析一年内同时占用的最大线路数。 As resolution "minutes" is fine.由于分辨率“分钟”很好。 So I calculated the differences in minutes to the beginning of the year (eg 2019-01-01 00:00:00) from t_Calltaken and t_Hangup to get something like a minute-id.因此,我从 t_Calltaken 和 t_Hangup 计算了到年初(例如 2019-01-01 00:00:00)的分钟差异,以获得类似分钟 ID 的信息。

For each call I can get the blocked minute-ids with seq(t_Calltaken,t_Hangup,by=1) .对于每个呼叫,我都可以使用seq(t_Calltaken,t_Hangup,by=1)获得阻塞的分钟 ID。

Callcenter %>% 
  mutate(start_minute_id=round(as.numeric(difftime(t_Calltaken,"2019-01-01 00:00:00",unit="mins"))),
         end_minute_id=round(as.numeric(difftime(t_Hangup,"2019-01-01 00:00:00",unit="mins")))) %>% 
  rowwise() %>% 
  mutate(blocked_minutes=list(seq(start_minute_id,end_minute_id,by=1)))

# A tibble: 10 × 6
# Rowwise: 
   line  t_Calltaken         t_Hangup            start_minute_id end_minute_id blocked_minutes
   <chr> <chr>               <chr>                         <dbl>         <dbl> <list>         
 1 A     2019-01-01 00:10:50 2019-01-01 00:33:10              11            33 <dbl [23]>     
 2 B     2019-01-01 00:12:30 2019-01-01 00:35:10              12            35 <dbl [24]>     
 3 C     2019-01-01 00:17:00 2019-01-01 01:07:33              17            68 <dbl [52]>     
 4 D     2019-01-01 00:20:50 2019-01-01 00:38:50              21            39 <dbl [19]>     
 5 A     2019-01-01 00:35:20 2019-01-01 00:49:27              35            49 <dbl [15]>     
 6 B     2019-01-01 00:42:50 2019-01-01 01:22:40              43            83 <dbl [41]>     
 7 D     2019-01-01 00:48:50 2019-01-01 01:10:41              49            71 <dbl [23]>     
 8 A     2019-01-01 01:03:20 2019-01-01 01:26:10              63            86 <dbl [24]>     
 9 C     2019-01-01 01:10:50 2019-01-01 01:47:44              71           108 <dbl [38]>     
10 D     2019-01-01 01:23:50 2019-01-01 01:51:15              84           111 <dbl [28]>  

I now would like to group by line and join all lists with blocked minute-ids together.我现在想按行分组并将所有具有阻止分钟 ID 的列表加入在一起。

How can I do this?我怎样才能做到这一点?

In the next step I want to analyze the number of occurences for blocked-minute-ids to get the maximum number of lines blocked in parallel.在下一步中,我想分析blocked-minute-ids 的出现次数,以获得并行阻塞的最大行数。 Is there another, more efficient way?还有其他更有效的方法吗?

Edit:编辑:

I would expect an output eg like this:我希望 output 例如像这样:

  line
1    A
2    B
3    C
4    D
                                                                                                                                                                                                                                                                                                                                                                     blocked_minutes
1                                                                                                                          c(11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86)
2                                                                                                              c(12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83)
3 c(17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108)
4      c(21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111)

Not totally sure if this is what you're going for, but:不完全确定这是否是你想要的,但是:

Callcenter %>% 
  mutate(start_minute_id=round(as.numeric(difftime(t_Calltaken,"2019-01-01 00:00:00",unit="mins"))),
         end_minute_id=round(as.numeric(difftime(t_Hangup,"2019-01-01 00:00:00",unit="mins")))) %>% 
  rowwise() %>% 
  mutate(blocked_minutes=list(seq(start_minute_id,end_minute_id,by=1))) %>% 

  unnest_longer(blocked_minutes) %>% 
  group_by(line) %>% 
  nest() %>% 
  unnest_wider(col = data)

# A tibble: 4 x 6
# Groups:   line [4]
  line  t_Calltaken t_Hangup   start_minute_id end_minute_id blocked_minutes
  <chr> <list>      <list>     <list>          <list>        <list>         
1 A     <chr [62]>  <chr [62]> <dbl [62]>      <dbl [62]>    <dbl [62]>     
2 B     <chr [65]>  <chr [65]> <dbl [65]>      <dbl [65]>    <dbl [65]>     
3 C     <chr [90]>  <chr [90]> <dbl [90]>      <dbl [90]>    <dbl [90]>     
4 D     <chr [70]>  <chr [70]> <dbl [70]>      <dbl [70]>    <dbl [70]>  

Thanks to the answer from @Ben GI was able to get to the solution for my longterm goal to find the maximum number of blocked lines at a time.感谢@Ben GI 的回答,我能够为我的长期目标找到解决方案,即一次找到最大数量的阻塞行。

With unnest_longer() the lists within blocked_minutes can be unnested and then I'm able to get to the result just by reforming the DF.使用unnest_longer()blocked_minutes中的列表可以取消嵌套,然后我可以通过重新构建 DF 来获得结果。

Callcenter %>% 
  mutate(start_minute_id=round(as.numeric(difftime(t_Calltaken,"2019-01-01 00:00:00",unit="mins"))),
         end_minute_id=round(as.numeric(difftime(t_Hangup,"2019-01-01 00:00:00",unit="mins")))) %>% 
  rowwise() %>% 
  mutate(blocked_minutes=list(seq(start_minute_id,end_minute_id,by=1))) %>% 
  unnest_longer(blocked_minutes) %>% 
  mutate(value=1) %>% 
  pivot_wider(id_cols=blocked_minutes, names_from="line",values_from="value") %>% 
  mutate(sum_blocked_lines=rowSums(.[,2:ncol(.)],na.rm=TRUE)) %>% 
  summarize(max_blocked_lines=max(sum_blocked_lines))

Results in:结果是:

# A tibble: 1 × 1
  max_blocked_lines
              <dbl>
1                 4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM