简体   繁体   中英

Dplyr - summarising multiple variables

My original data is structured as such:

Article  Channel1_qty Channel2_qty Channel3_qty

 110        30             10           0
 110        40             0            10
 111        50             5            2
 111        60             3            18

I'm ultimately trying to produce a df that shows sums of articles of clothing sold for each channel_qty as well as counts of the number of articles. Using the above example, it would look something like:

Article_count | channel | Sum (total article qty for channel)
      2            1        180
      2            2        18
      2            3        30

I attempted to structure it this way with the following code, but it didn't work:

df %>%
  select(Article,
         channel1_qty, 
         channel2_qty,
         channel3_qty) %>% 
  gather(key = "channel", value = "value", -Article) %>%
  group_by(channel)
  summarise(
    Article_count = n_distinct(Article),
    total = sum(value)
  )

Tried a few variations of this. Thinking of doing it in separate steps or as a loop, if necessary. I'm thinking there must be an easier / more elegant way in dplyr, though. Thanks!

You are on the right track to tidyr::gather()/tidyr::pivot_longer() , followed by dplyr::group_by() and then finally dplyr::summarize() .

The regex in the names_pattern argument just strips away everything except the actual channel number from the original column names.

library(tidyverse)

d <- structure(list(Article = c(110L, 110L, 111L, 111L), Channel1_qty = c(30L, 40L, 50L, 60L), Channel2_qty = c(10L, 0L, 5L, 3L), Channel3_qty = c(0L, 10L, 2L, 18L)), class = "data.frame", row.names = c(NA, -4L))

d %>% 
  pivot_longer(-Article, 
               names_pattern = "^Channel(.*)_qty", 
               names_to = "channel", 
               values_to = "qty") %>% 
  group_by(channel) %>% 
  summarize(Article_count = n_distinct(Article),
            Sum = sum(qty))
#> # A tibble: 3 × 3
#>   channel Article_count   Sum
#>   <chr>           <int> <int>
#> 1 1                   2   180
#> 2 2                   2    18
#> 3 3                   2    30

Created on 2022-08-04 by the reprex package (v2.0.1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM