简体   繁体   中英

R: summarize based on multiple conditions in dyplr

I am trying to summarize a dataframe to create two summaries:

  1. count the number of orders only QUOT or QUOG appear
  2. count the number of orders QUOT or QUOG appear and where there are other Holds appearing too

Below is the start of the code:

library(dplyr)


dat <- data.frame(Order = c(123,123,123,145,145,189,210,210,123,123,164), 
                  Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
                  Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)


test <- dat %>%
  group_by(Order, Location) %>%

  .....

I get stuck with trying to find out if a particular order only has QUOT or QUOG and then if it has QUOT or QUOG and others.

Expected output:

   Location Only Multiple
1   Chicago    1        2
2 Charlotte    1        1

So for the expected output:

  • Order 123, Chicago: has QUOT in it and another hold ( ENGR & VEND ) so this would be considered a multiple for Chicago
  • Order 145, Chicago: has QUOG in it and another hold ( ENGR ) so this would be considered a multiple for Chicago
  • Order 189, Chicago: has QUOT in it and no other holds so this would be considered a only for Chicago
  • Order 210, Chicago: has neither QUOT or QUOG so this order gets excluded in the count
  • Order 123, Charlotte: has QUOT in it and another hold ( CUST ) so this would be considered a multiple for Charlotte
  • Order 164, Charlotte: has QUOT in it and no other holds so this would be considered a only for Charlotte

I think this should work -- you may want to test this with a few other Orders:

library(dplyr)
library(tidyr)

dat <- data.frame(
  Order = c(123,123,123,145,145,189,210,210,123,123,164), 
  Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
  Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)

dat %>% 
    group_by(Order, Location) %>% 
    mutate(
        quot_or_quog = Hold %in% c("QUOT", "QUOG"),
        distinct_quot_or_quog = n_distinct(quot_or_quog)
    ) %>% 
    # Remove those that do not have "QUOT" or "QUOG"
    filter(quot_or_quog) %>% 
    mutate(
        label = if_else(distinct_quot_or_quog == 1, "Only", "Multiple")
    ) %>% 
    group_by(label, add = TRUE) %>%
    summarise(num_label = n_distinct(label)) %>% 
    group_by(Location, label) %>%
    count(num_label) %>% 
    pivot_wider(
        names_from = label,
        values_from = n
    ) %>% 
    select(-num_label)
#> # A tibble: 2 x 3
#> # Groups:   Location [2]
#>   Location  Multiple  Only
#>   <fct>        <int> <int>
#> 1 Charlotte        1     1
#> 2 Chicago          2     1

Created on 2020-02-24 by the reprex package (v0.3.0)

Here is another solution using dplyr and tidyr . This time the pivoting happens first, and then filtering and summarising are done afterward to get to your solution.

library(dplyr)
library(tidyr)

dat.summary <- dat %>%
  mutate(hold_count = 1) %>% 
  pivot_wider(names_from = Hold, values_from = hold_count) %>% 
  mutate(only = if_else((QUOT == 1 | QUOG == 1) & is.na(ENGR) & is.na(VEND) & is.na(CUST), 1, 0),
         multiple = if_else((QUOT == 1 | QUOG == 1) & (ENGR == 1 | VEND == 1 | CUST ==1), 1, 0)) %>% 
  group_by(Location) %>% 
  summarise(only = sum(only, na.rm = T), multiple = sum(multiple, na.rm = T))

dat.summary

gives you:

# A tibble: 2 x 3
  Location   only multiple
  <fct>     <dbl>    <dbl>
1 Charlotte     1        1
2 Chicago       1        2

DATA

dat <- data.frame(
  Order = c(123,123,123,145,145,189,210,210,123,123,164), 
  Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
  Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM