[英]R: summarize based on multiple conditions in dyplr
我正在尝试总结一个数据框以创建两个总结:
QUOT
或QUOG
QUOT
或QUOG
出现的订单数量以及出现其他Holds
下面是代码的开头:
library(dplyr)
dat <- data.frame(Order = c(123,123,123,145,145,189,210,210,123,123,164),
Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)
test <- dat %>%
group_by(Order, Location) %>%
.....
我一直在试图找出特定订单是否只有QUOT
或QUOG
,然后它是否有QUOT
或QUOG
以及其他。
预期输出:
Location Only Multiple
1 Chicago 1 2
2 Charlotte 1 1
所以对于预期的输出:
QUOT
和另一个保留( ENGR
和VEND
),因此这将被视为芝加哥的倍数QUOG
和另一个持有 ( ENGR
),因此这将被视为芝加哥的倍数QUOT
且没有其他保留,因此这将被视为仅适用于芝加哥QUOT
或QUOG
所以这个顺序被排除在计数QUOT
和另一个保留 ( CUST
),因此这将被视为夏洛特的倍数QUOT
且没有其他保留,因此这将被视为仅适用于夏洛特我认为这应该有效——你可能想用其他一些订单来测试这个:
library(dplyr)
library(tidyr)
dat <- data.frame(
Order = c(123,123,123,145,145,189,210,210,123,123,164),
Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)
dat %>%
group_by(Order, Location) %>%
mutate(
quot_or_quog = Hold %in% c("QUOT", "QUOG"),
distinct_quot_or_quog = n_distinct(quot_or_quog)
) %>%
# Remove those that do not have "QUOT" or "QUOG"
filter(quot_or_quog) %>%
mutate(
label = if_else(distinct_quot_or_quog == 1, "Only", "Multiple")
) %>%
group_by(label, add = TRUE) %>%
summarise(num_label = n_distinct(label)) %>%
group_by(Location, label) %>%
count(num_label) %>%
pivot_wider(
names_from = label,
values_from = n
) %>%
select(-num_label)
#> # A tibble: 2 x 3
#> # Groups: Location [2]
#> Location Multiple Only
#> <fct> <int> <int>
#> 1 Charlotte 1 1
#> 2 Chicago 2 1
由reprex 包(v0.3.0) 于 2020 年 2 月 24 日创建
这是使用dplyr
和tidyr
另一个解决方案。 这次首先进行旋转,然后进行过滤和汇总以得出您的解决方案。
library(dplyr)
library(tidyr)
dat.summary <- dat %>%
mutate(hold_count = 1) %>%
pivot_wider(names_from = Hold, values_from = hold_count) %>%
mutate(only = if_else((QUOT == 1 | QUOG == 1) & is.na(ENGR) & is.na(VEND) & is.na(CUST), 1, 0),
multiple = if_else((QUOT == 1 | QUOG == 1) & (ENGR == 1 | VEND == 1 | CUST ==1), 1, 0)) %>%
group_by(Location) %>%
summarise(only = sum(only, na.rm = T), multiple = sum(multiple, na.rm = T))
dat.summary
给你:
# A tibble: 2 x 3
Location only multiple
<fct> <dbl> <dbl>
1 Charlotte 1 1
2 Chicago 1 2
数据
dat <- data.frame(
Order = c(123,123,123,145,145,189,210,210,123,123,164),
Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.