[英]R: summarize based on multiple conditions in dyplr
我正在嘗試總結一個數據框以創建兩個總結:
QUOT
或QUOG
QUOT
或QUOG
出現的訂單數量以及出現其他Holds
下面是代碼的開頭:
library(dplyr)
dat <- data.frame(Order = c(123,123,123,145,145,189,210,210,123,123,164),
Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)
test <- dat %>%
group_by(Order, Location) %>%
.....
我一直在試圖找出特定訂單是否只有QUOT
或QUOG
,然后它是否有QUOT
或QUOG
以及其他。
預期輸出:
Location Only Multiple
1 Chicago 1 2
2 Charlotte 1 1
所以對於預期的輸出:
QUOT
和另一個保留( ENGR
和VEND
),因此這將被視為芝加哥的倍數QUOG
和另一個持有 ( ENGR
),因此這將被視為芝加哥的倍數QUOT
且沒有其他保留,因此這將被視為僅適用於芝加哥QUOT
或QUOG
所以這個順序被排除在計數QUOT
和另一個保留 ( CUST
),因此這將被視為夏洛特的倍數QUOT
且沒有其他保留,因此這將被視為僅適用於夏洛特我認為這應該有效——你可能想用其他一些訂單來測試這個:
library(dplyr)
library(tidyr)
dat <- data.frame(
Order = c(123,123,123,145,145,189,210,210,123,123,164),
Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)
dat %>%
group_by(Order, Location) %>%
mutate(
quot_or_quog = Hold %in% c("QUOT", "QUOG"),
distinct_quot_or_quog = n_distinct(quot_or_quog)
) %>%
# Remove those that do not have "QUOT" or "QUOG"
filter(quot_or_quog) %>%
mutate(
label = if_else(distinct_quot_or_quog == 1, "Only", "Multiple")
) %>%
group_by(label, add = TRUE) %>%
summarise(num_label = n_distinct(label)) %>%
group_by(Location, label) %>%
count(num_label) %>%
pivot_wider(
names_from = label,
values_from = n
) %>%
select(-num_label)
#> # A tibble: 2 x 3
#> # Groups: Location [2]
#> Location Multiple Only
#> <fct> <int> <int>
#> 1 Charlotte 1 1
#> 2 Chicago 2 1
由reprex 包(v0.3.0) 於 2020 年 2 月 24 日創建
這是使用dplyr
和tidyr
另一個解決方案。 這次首先進行旋轉,然后進行過濾和匯總以得出您的解決方案。
library(dplyr)
library(tidyr)
dat.summary <- dat %>%
mutate(hold_count = 1) %>%
pivot_wider(names_from = Hold, values_from = hold_count) %>%
mutate(only = if_else((QUOT == 1 | QUOG == 1) & is.na(ENGR) & is.na(VEND) & is.na(CUST), 1, 0),
multiple = if_else((QUOT == 1 | QUOG == 1) & (ENGR == 1 | VEND == 1 | CUST ==1), 1, 0)) %>%
group_by(Location) %>%
summarise(only = sum(only, na.rm = T), multiple = sum(multiple, na.rm = T))
dat.summary
給你:
# A tibble: 2 x 3
Location only multiple
<fct> <dbl> <dbl>
1 Charlotte 1 1
2 Chicago 1 2
數據
dat <- data.frame(
Order = c(123,123,123,145,145,189,210,210,123,123,164),
Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.