[英]How to calculate share rate for each group in Dplyr?
I have a data set that shows the number of products for each group and shop.我有一个数据集,显示每个组和商店的产品数量。
df <- tribble(
~shop_id, ~group, ~group_2, ~products,
'1', 'A', 'Z', 10,
'2', 'B', 'Y', 20,
'3', 'A', 'X', 30,
'4', 'B', 'X', 40,
'5', 'A', 'R', 10
)
I now want to see the share of products for each shop id and group.我现在想查看每个商店 ID 和组的产品份额。 But I want to exclude the group 2 column in the data.但我想排除数据中的第 2 组列。 For instance, there are 50 products in group A, so the share for shop 1 should be 0.2.例如,A 组有 50 种产品,那么店铺 1 的份额应该是 0.2。 Here is the desired output:这是所需的 output:
df <- tribble(
~shop_id, ~group, ~products, ~share_products,
'1', 'A', 10, 0.2,
'2', 'B', 20, 0.33,
'3', 'A', 30, 0.6,
'4', 'B', 40, 0.66,
'5', 'A', 10, 0.2
)
How can I do this?我怎样才能做到这一点?
After grouping, divide by the sum
of 'products'分组后,除以“产品”的sum
library(dplyr)
df1 <- df %>%
group_by(group) %>%
mutate(share_products = products/sum(products)) %>%
ungroup
-output -输出
df1
# A tibble: 5 × 4
shop_id group products share_products
<chr> <chr> <dbl> <dbl>
1 1 A 10 0.2
2 2 B 20 0.333
3 3 A 30 0.6
4 4 B 40 0.667
5 5 A 10 0.2
If there are several 'products' column, loop across
those columns to create the corresponding 'share' columns如果有across
“产品”列,则遍历这些列以创建相应的“分享”列
df1 <- df %>%
group_by(group) %>%
mutate(across(contains('products'), ~.x/sum(.x),
.names = "share_{.col}")) %>%
ungroup
We could use prop.table
after grouping我们可以在分组后使用prop.table
df %>%
select(-group_2) %>%
group_by(group) %>%
mutate(prop = prop.table(products))
shop_id group products prop
<chr> <chr> <dbl> <dbl>
1 1 A 10 0.2
2 2 B 20 0.333
3 3 A 30 0.6
4 4 B 40 0.667
5 5 A 10 0.2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.