如何计算 Dplyr 中每个组的分享率？

Question

I have a data set that shows the number of products for each group and shop.我有一个数据集，显示每个组和商店的产品数量。

df <- tribble(
  ~shop_id, ~group, ~group_2, ~products,  
    '1',      'A',     'Z',      10,                
    '2',      'B',     'Y',      20, 
    '3',      'A',     'X',      30, 
    '4',      'B',     'X',      40, 
    '5',      'A',     'R',      10
)

I now want to see the share of products for each shop id and group.我现在想查看每个商店 ID 和组的产品份额。 But I want to exclude the group 2 column in the data.但我想排除数据中的第 2 组列。 For instance, there are 50 products in group A, so the share for shop 1 should be 0.2.例如，A 组有 50 种产品，那么店铺 1 的份额应该是 0.2。 Here is the desired output:这是所需的 output：

df <- tribble(
  ~shop_id, ~group, ~products,  ~share_products, 
  '1',      'A',       10,            0.2,    
  '2',      'B',       20,            0.33,
  '3',      'A',       30,            0.6,
  '4',      'B',       40,            0.66,
  '5',      'A',       10,            0.2
)

How can I do this?我怎样才能做到这一点？

Answer 1

After grouping, divide by the sum of 'products'分组后，除以“产品”的sum

library(dplyr)
df1 <- df %>% 
 group_by(group) %>% 
 mutate(share_products = products/sum(products)) %>% 
 ungroup

-output -输出

df1
# A tibble: 5 × 4
  shop_id group products share_products
  <chr>   <chr>    <dbl>          <dbl>
1 1       A           10          0.2  
2 2       B           20          0.333
3 3       A           30          0.6  
4 4       B           40          0.667
5 5       A           10          0.2

If there are several 'products' column, loop across those columns to create the corresponding 'share' columns如果有across “产品”列，则遍历这些列以创建相应的“分享”列

df1 <- df %>%
      group_by(group) %>%
      mutate(across(contains('products'),  ~.x/sum(.x),
       .names = "share_{.col}")) %>%
      ungroup

Answer 2

We could use prop.table after grouping我们可以在分组后使用prop.table

df %>%
  select(-group_2) %>% 
  group_by(group) %>%
  mutate(prop = prop.table(products))

  shop_id group products  prop
  <chr>   <chr>    <dbl> <dbl>
1 1       A           10 0.2  
2 2       B           20 0.333
3 3       A           30 0.6  
4 4       B           40 0.667
5 5       A           10 0.2

如何计算 Dplyr 中每个组的分享率？

问题描述

2 个解决方案

解决方案1
1 2022-01-19 21:16:34

解决方案2
0 2022-01-19 21:58:46

如何计算 Dplyr 中每个组的分享率？

问题描述

2 个解决方案

解决方案1 1 2022-01-19 21:16:34

解决方案2 0 2022-01-19 21:58:46

解决方案1
1 2022-01-19 21:16:34

解决方案2
0 2022-01-19 21:58:46