[英]R: summarize based on multiple conditions in dyplr
I am trying to summarize a dataframe to create two summaries:我正在尝试总结一个数据框以创建两个总结:
QUOT
or QUOG
appearQUOT
或QUOG
QUOT
or QUOG
appear and where there are other Holds
appearing tooQUOT
或QUOG
出现的订单数量以及出现其他Holds
Below is the start of the code:下面是代码的开头:
library(dplyr)
dat <- data.frame(Order = c(123,123,123,145,145,189,210,210,123,123,164),
Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)
test <- dat %>%
group_by(Order, Location) %>%
.....
I get stuck with trying to find out if a particular order only has QUOT
or QUOG
and then if it has QUOT
or QUOG
and others.我一直在试图找出特定订单是否只有
QUOT
或QUOG
,然后它是否有QUOT
或QUOG
以及其他。
Expected output:预期输出:
Location Only Multiple
1 Chicago 1 2
2 Charlotte 1 1
So for the expected output:所以对于预期的输出:
QUOT
in it and another hold ( ENGR
& VEND
) so this would be considered a multiple for ChicagoQUOT
和另一个保留( ENGR
和VEND
),因此这将被视为芝加哥的倍数QUOG
in it and another hold ( ENGR
) so this would be considered a multiple for ChicagoQUOG
和另一个持有 ( ENGR
),因此这将被视为芝加哥的倍数QUOT
in it and no other holds so this would be considered a only for ChicagoQUOT
且没有其他保留,因此这将被视为仅适用于芝加哥QUOT
or QUOG
so this order gets excluded in the countQUOT
或QUOG
所以这个顺序被排除在计数QUOT
in it and another hold ( CUST
) so this would be considered a multiple for CharlotteQUOT
和另一个保留 ( CUST
),因此这将被视为夏洛特的倍数QUOT
in it and no other holds so this would be considered a only for CharlotteQUOT
且没有其他保留,因此这将被视为仅适用于夏洛特I think this should work -- you may want to test this with a few other Orders:我认为这应该有效——你可能想用其他一些订单来测试这个:
library(dplyr)
library(tidyr)
dat <- data.frame(
Order = c(123,123,123,145,145,189,210,210,123,123,164),
Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)
dat %>%
group_by(Order, Location) %>%
mutate(
quot_or_quog = Hold %in% c("QUOT", "QUOG"),
distinct_quot_or_quog = n_distinct(quot_or_quog)
) %>%
# Remove those that do not have "QUOT" or "QUOG"
filter(quot_or_quog) %>%
mutate(
label = if_else(distinct_quot_or_quog == 1, "Only", "Multiple")
) %>%
group_by(label, add = TRUE) %>%
summarise(num_label = n_distinct(label)) %>%
group_by(Location, label) %>%
count(num_label) %>%
pivot_wider(
names_from = label,
values_from = n
) %>%
select(-num_label)
#> # A tibble: 2 x 3
#> # Groups: Location [2]
#> Location Multiple Only
#> <fct> <int> <int>
#> 1 Charlotte 1 1
#> 2 Chicago 2 1
Created on 2020-02-24 by the reprex package (v0.3.0)由reprex 包(v0.3.0) 于 2020 年 2 月 24 日创建
Here is another solution using dplyr
and tidyr
.这是使用
dplyr
和tidyr
另一个解决方案。 This time the pivoting happens first, and then filtering and summarising are done afterward to get to your solution.这次首先进行旋转,然后进行过滤和汇总以得出您的解决方案。
library(dplyr)
library(tidyr)
dat.summary <- dat %>%
mutate(hold_count = 1) %>%
pivot_wider(names_from = Hold, values_from = hold_count) %>%
mutate(only = if_else((QUOT == 1 | QUOG == 1) & is.na(ENGR) & is.na(VEND) & is.na(CUST), 1, 0),
multiple = if_else((QUOT == 1 | QUOG == 1) & (ENGR == 1 | VEND == 1 | CUST ==1), 1, 0)) %>%
group_by(Location) %>%
summarise(only = sum(only, na.rm = T), multiple = sum(multiple, na.rm = T))
dat.summary
gives you:给你:
# A tibble: 2 x 3
Location only multiple
<fct> <dbl> <dbl>
1 Charlotte 1 1
2 Chicago 1 2
DATA数据
dat <- data.frame(
Order = c(123,123,123,145,145,189,210,210,123,123,164),
Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.