R: summarize based on multiple conditions in dyplr

Question

I am trying to summarize a dataframe to create two summaries:

count the number of orders only QUOT or QUOG appear
count the number of orders QUOT or QUOG appear and where there are other Holds appearing too

Below is the start of the code:

library(dplyr)


dat <- data.frame(Order = c(123,123,123,145,145,189,210,210,123,123,164), 
                  Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
                  Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)


test <- dat %>%
  group_by(Order, Location) %>%

  .....

I get stuck with trying to find out if a particular order only has QUOT or QUOG and then if it has QUOT or QUOG and others.

Expected output:

   Location Only Multiple
1   Chicago    1        2
2 Charlotte    1        1

So for the expected output:

Order 123, Chicago: has QUOT in it and another hold ( ENGR & VEND ) so this would be considered a multiple for Chicago
Order 145, Chicago: has QUOG in it and another hold ( ENGR ) so this would be considered a multiple for Chicago
Order 189, Chicago: has QUOT in it and no other holds so this would be considered a only for Chicago
Order 210, Chicago: has neither QUOT or QUOG so this order gets excluded in the count
Order 123, Charlotte: has QUOT in it and another hold ( CUST ) so this would be considered a multiple for Charlotte
Order 164, Charlotte: has QUOT in it and no other holds so this would be considered a only for Charlotte

Answer 1

I think this should work -- you may want to test this with a few other Orders:

library(dplyr)
library(tidyr)

dat <- data.frame(
  Order = c(123,123,123,145,145,189,210,210,123,123,164), 
  Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
  Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)

dat %>% 
    group_by(Order, Location) %>% 
    mutate(
        quot_or_quog = Hold %in% c("QUOT", "QUOG"),
        distinct_quot_or_quog = n_distinct(quot_or_quog)
    ) %>% 
    # Remove those that do not have "QUOT" or "QUOG"
    filter(quot_or_quog) %>% 
    mutate(
        label = if_else(distinct_quot_or_quog == 1, "Only", "Multiple")
    ) %>% 
    group_by(label, add = TRUE) %>%
    summarise(num_label = n_distinct(label)) %>% 
    group_by(Location, label) %>%
    count(num_label) %>% 
    pivot_wider(
        names_from = label,
        values_from = n
    ) %>% 
    select(-num_label)
#> # A tibble: 2 x 3
#> # Groups:   Location [2]
#>   Location  Multiple  Only
#>   <fct>        <int> <int>
#> 1 Charlotte        1     1
#> 2 Chicago          2     1

^{Created on 2020-02-24 by the reprex package (v0.3.0)}

Answer 2

Here is another solution using dplyr and tidyr . This time the pivoting happens first, and then filtering and summarising are done afterward to get to your solution.

library(dplyr)
library(tidyr)

dat.summary <- dat %>%
  mutate(hold_count = 1) %>% 
  pivot_wider(names_from = Hold, values_from = hold_count) %>% 
  mutate(only = if_else((QUOT == 1 | QUOG == 1) & is.na(ENGR) & is.na(VEND) & is.na(CUST), 1, 0),
         multiple = if_else((QUOT == 1 | QUOG == 1) & (ENGR == 1 | VEND == 1 | CUST ==1), 1, 0)) %>% 
  group_by(Location) %>% 
  summarise(only = sum(only, na.rm = T), multiple = sum(multiple, na.rm = T))

dat.summary

gives you:

# A tibble: 2 x 3
  Location   only multiple
  <fct>     <dbl>    <dbl>
1 Charlotte     1        1
2 Chicago       1        2

DATA

dat <- data.frame(
  Order = c(123,123,123,145,145,189,210,210,123,123,164), 
  Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
  Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)

R: summarize based on multiple conditions in dyplr

Question

2 answers

solution1
3 ACCPTED 2020-02-24 18:57:40

solution2
0 2020-02-25 22:15:38

R: summarize based on multiple conditions in dyplr

Question

2 answers

solution1 3 ACCPTED 2020-02-24 18:57:40

solution2 0 2020-02-25 22:15:38

solution1
3 ACCPTED 2020-02-24 18:57:40

solution2
0 2020-02-25 22:15:38