简体   繁体   English

R:基于dyplr中的多个条件进行汇总

[英]R: summarize based on multiple conditions in dyplr

I am trying to summarize a dataframe to create two summaries:我正在尝试总结一个数据框以创建两个总结:

  1. count the number of orders only QUOT or QUOG appear统计订单数量只出现QUOTQUOG
  2. count the number of orders QUOT or QUOG appear and where there are other Holds appearing too计算QUOTQUOG出现的订单数量以及出现其他Holds

Below is the start of the code:下面是代码的开头:

library(dplyr)


dat <- data.frame(Order = c(123,123,123,145,145,189,210,210,123,123,164), 
                  Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
                  Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)


test <- dat %>%
  group_by(Order, Location) %>%

  .....

I get stuck with trying to find out if a particular order only has QUOT or QUOG and then if it has QUOT or QUOG and others.我一直在试图找出特定订单是否只有QUOTQUOG ,然后它是否有QUOTQUOG以及其他。

Expected output:预期输出:

   Location Only Multiple
1   Chicago    1        2
2 Charlotte    1        1

So for the expected output:所以对于预期的输出:

  • Order 123, Chicago: has QUOT in it and another hold ( ENGR & VEND ) so this would be considered a multiple for Chicago订单 123,芝加哥:其中包含QUOT和另一个保留( ENGRVEND ),因此这将被视为芝加哥的倍数
  • Order 145, Chicago: has QUOG in it and another hold ( ENGR ) so this would be considered a multiple for Chicago芝加哥 145 号订单:其中包含QUOG和另一个持有 ( ENGR ),因此这将被视为芝加哥的倍数
  • Order 189, Chicago: has QUOT in it and no other holds so this would be considered a only for Chicago芝加哥 189 号订单:其中有QUOT且没有其他保留,因此这将被视为适用于芝加哥
  • Order 210, Chicago: has neither QUOT or QUOG so this order gets excluded in the count订购210,芝加哥:既没有QUOTQUOG所以这个顺序被排除在计数
  • Order 123, Charlotte: has QUOT in it and another hold ( CUST ) so this would be considered a multiple for Charlotte订单 123,夏洛特:其中包含QUOT和另一个保留 ( CUST ),因此这将被视为夏洛特的倍数
  • Order 164, Charlotte: has QUOT in it and no other holds so this would be considered a only for Charlotte夏洛特 164 号订单:其中有QUOT且没有其他保留,因此这将被视为适用于夏洛特

I think this should work -- you may want to test this with a few other Orders:我认为这应该有效——你可能想用其他一些订单来测试这个:

library(dplyr)
library(tidyr)

dat <- data.frame(
  Order = c(123,123,123,145,145,189,210,210,123,123,164), 
  Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
  Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)

dat %>% 
    group_by(Order, Location) %>% 
    mutate(
        quot_or_quog = Hold %in% c("QUOT", "QUOG"),
        distinct_quot_or_quog = n_distinct(quot_or_quog)
    ) %>% 
    # Remove those that do not have "QUOT" or "QUOG"
    filter(quot_or_quog) %>% 
    mutate(
        label = if_else(distinct_quot_or_quog == 1, "Only", "Multiple")
    ) %>% 
    group_by(label, add = TRUE) %>%
    summarise(num_label = n_distinct(label)) %>% 
    group_by(Location, label) %>%
    count(num_label) %>% 
    pivot_wider(
        names_from = label,
        values_from = n
    ) %>% 
    select(-num_label)
#> # A tibble: 2 x 3
#> # Groups:   Location [2]
#>   Location  Multiple  Only
#>   <fct>        <int> <int>
#> 1 Charlotte        1     1
#> 2 Chicago          2     1

Created on 2020-02-24 by the reprex package (v0.3.0)reprex 包(v0.3.0) 于 2020 年 2 月 24 日创建

Here is another solution using dplyr and tidyr .这是使用dplyrtidyr另一个解决方案。 This time the pivoting happens first, and then filtering and summarising are done afterward to get to your solution.这次首先进行旋转,然后进行过滤和汇总以得出您的解决方案。

library(dplyr)
library(tidyr)

dat.summary <- dat %>%
  mutate(hold_count = 1) %>% 
  pivot_wider(names_from = Hold, values_from = hold_count) %>% 
  mutate(only = if_else((QUOT == 1 | QUOG == 1) & is.na(ENGR) & is.na(VEND) & is.na(CUST), 1, 0),
         multiple = if_else((QUOT == 1 | QUOG == 1) & (ENGR == 1 | VEND == 1 | CUST ==1), 1, 0)) %>% 
  group_by(Location) %>% 
  summarise(only = sum(only, na.rm = T), multiple = sum(multiple, na.rm = T))

dat.summary

gives you:给你:

# A tibble: 2 x 3
  Location   only multiple
  <fct>     <dbl>    <dbl>
1 Charlotte     1        1
2 Chicago       1        2

DATA数据

dat <- data.frame(
  Order = c(123,123,123,145,145,189,210,210,123,123,164), 
  Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
  Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM