[英]Group_by and summarise and preserve initial order without arrange R
I need to preserve row order after group_by
and summarise
.我需要在group_by
和summarise
之后保留行顺序。
Here is initial dataset:这是初始数据集:
movmnt_id <- c("101", "101", "351", "601","601","351")
plant <- c("F5P", "F5D", "F5P", "F5D","F5D", "RUP")
loc <- c("CB00", "CB00", "CB00", "CB00","CB00","MOS1")
qty <- c(100, 100,100,10,90,88)
date <- c("2018-01-05","2018-01-05","2018-01-05","2018-01-11","2018-01-11","2018-01-22" )
time <- c("10:38:38","10:47:17", "10:47:09","17:20:31","17:20:24","12:00:54" )
df <- data.frame(movmnt_id, plant, loc, qty,date, time)
df
movmnt_id plant loc qty date time
1 101 F5P CB00 100 2018-01-05 10:38:38
2 101 F5D CB00 100 2018-01-05 10:47:17
3 351 F5P CB00 100 2018-01-05 10:47:09
4 601 F5D CB00 10 2018-01-11 17:20:31
5 601 F5D CB00 90 2018-01-11 17:20:24
6 351 RUP MOS1 88 2018-01-22 12:00:54
I need first to order according to specific conditions (here the dataset and conditions are significantly simplified).我需要先根据具体条件下单(这里的数据集和条件都大大简化了)。 I do it like that:我这样做:
df2 <- df %>%
dplyr::group_by( movmnt_id, plant, loc,date,time) %>%
dplyr::summarise(total_qty = sum(qty)) %>%
dplyr::arrange( date,time) %>%
dplyr::ungroup()
df2
movmnt_id plant loc date time total_qty
<fct> <fct> <fct> <fct> <fct> <dbl>
1 101 F5P CB00 2018-01-05 10:38:38 100
2 351 F5P CB00 2018-01-05 10:47:09 100
3 101 F5D CB00 2018-01-05 10:47:17 100
4 601 F5D CB00 2018-01-11 17:20:24 90
5 601 F5D CB00 2018-01-11 17:20:31 10
6 351 RUP MOS1 2018-01-22 12:00:54 88
This result is ok.这个结果是可以的。 And then I need to drop timestamp
and summarise
by qty.然后我需要删除timestamp
并按数量summarise
。
My last attmept looks like this:我的最后一次尝试看起来像这样:
df3 <- df2 %>%
dplyr::group_by( movmnt_id, plant, loc,date) %>%
dplyr::summarise(total_qty = sum(total_qty)) %>%
dplyr::ungroup()
df3
movmnt_id plant loc date total_qty
<fct> <fct> <fct> <fct> <dbl>
1 101 F5D CB00 2018-01-05 100
2 101 F5P CB00 2018-01-05 100
3 351 F5P CB00 2018-01-05 100
4 351 RUP MOS1 2018-01-22 88
5 601 F5D CB00 2018-01-11 100
This is not ok - I am loosing previous order.这不行 - 我失去了之前的订单。
What I need is one row for movmnt_id = 601
and the same order as in df2, movmnt_id = 351
with date 2018-01-05 should be between movements 101 with th same date:我需要的是movmnt_id = 601
的一行,并且与 df2 中的顺序相同,日期为 2018-01-05 的movmnt_id = 351
应该在同一日期的移动 101 之间:
movmnt_id plant loc date time total_qty
<fct> <fct> <fct> <fct> <fct> <dbl>
1 101 F5P CB00 2018-01-05 10:38:38 100
2 351 F5P CB00 2018-01-05 10:47:09 100
3 101 F5D CB00 2018-01-05 10:47:17 100
4 601 F5D CB00 2018-01-11 17:20:24 100
5 351 RUP MOS1 2018-01-22 12:00:54 88
Basically, if all values in grouping condition are the same except qty - these rows can be summed, but if not - order have to be kept.基本上,如果分组条件中的所有值都相同,除了 qty - 这些行可以相加,但如果不是 - 必须保持顺序。
How can I do it?我该怎么做?
To maintain the same order from df2
you can create a unique key and match
.要保持与df2
相同的顺序,您可以创建唯一键并match
。
cols <- c('movmnt_id', 'plant', 'loc', 'date')
df3 <- df3[order(match(do.call(paste, df3[cols]), do.call(paste, df2[cols]))), ]
df3
# movmnt_id plant loc date total_qty
# <chr> <chr> <chr> <chr> <dbl>
#1 101 F5P CB00 2018-01-05 100
#2 351 F5P CB00 2018-01-05 100
#3 101 F5D CB00 2018-01-05 100
#4 601 F5D CB00 2018-01-11 100
#5 351 RUP MOS1 2018-01-22 88
Here I make an ordered factor, here called "key", for the id / plant / loc combinations in chronological order.在这里,我按时间顺序为 id / plant / loc 组合创建了一个有序因子,这里称为“key”。 Then when we aggregate by it (shortcut using count in place of group_by %>% summarize
), and count uses it to order the output.然后当我们通过它聚合时(使用 count 代替group_by %>% summarize
的快捷方式),并且 count 使用它来订购 output。
library(forcats) # alternatively, load with library(tidyverse)
df %>%
arrange(date, time) %>%
mutate(key = paste(movmnt_id, plant, loc) %>% as_factor %>% fct_inorder()) %>%
count(key, date, movmnt_id, plant, loc, wt = qty, name = "total_qty")
key date movmnt_id plant loc total_qty
1 101 F5P CB00 2018-01-05 101 F5P CB00 100
2 351 F5P CB00 2018-01-05 351 F5P CB00 100
3 101 F5D CB00 2018-01-05 101 F5D CB00 100
4 601 F5D CB00 2018-01-11 601 F5D CB00 100
5 351 RUP MOS1 2018-01-22 351 RUP MOS1 88
Implicitly, you want to maintain the order given by the date variable.隐含地,您希望维护由 date 变量给出的顺序。 List date
first in the group_by
arguments to ensure that the summarise
command uses date
as primary key.在group_by
arguments 中首先列出date
,以确保summarise
命令使用date
作为主键。
df %>%
group_by(date, movmnt_id, plant, loc) %>%
summarise(total_qty = sum(qty)) %>%
ungroup()
#> `summarise()` has grouped output by 'date', 'movmnt_id', 'plant'. You can override using the `.groups` argument.
#> # A tibble: 5 x 5
#> date movmnt_id plant loc total_qty
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 2018-01-05 101 F5D CB00 100
#> 2 2018-01-05 101 F5P CB00 100
#> 3 2018-01-05 351 F5P CB00 100
#> 4 2018-01-11 601 F5D CB00 100
#> 5 2018-01-22 351 RUP MOS1 88
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.