计算变量对 R 中总增长的贡献

Question

我目前正在尝试计算变量对总数增长的贡献。 公式如下：在 T - T' 时期：变量 X 对总变量 Y 增长的贡献定义如下：

(Xt/Yt)*((Xt'-Xt)/Xt)*100

这是我的数据集：

 structure(list(regroupement = c("Autres", "Ortho (+ rhumato et rachis)", 
"Rachis", "Chirurgie digestive", "Ophtalmo", "Uro-néphro", "Gynéco", 
"ORL Stomato sf bouche et dent", "bouche et dents", "Tissus mou et chir plastique", 
"Chir thoracique et vasculaire", "Chir thoracique", "Chir esth et hors sécu", 
"Divers chir", "Gastro", "Endoscopies digestives", "Cardio Vasc (médecine)", 
"Pneumologie", "Neurologie", "Soins palliatifs", "Vasculaire interventionnel", 
"Divers médecine", "Accouchements", "Obstétrique autre (hors IVG)", 
"IVG", "Néo nat", "Séances autres", "Total"), actes_2019 = c(10, 
29520, 395, 14618, 5589, 6515, 4150, 866, 3458, 2137, 449, 0, 
575, 2180, 9179, 36079, 311, 388, 714, 4, 0, 6024, 4028, 294, 
292, 1, 1842, 129618), actes_2020 = c(8, 25451, 308, 12845, 4167, 
7376, 2994, 337, 2206, 2107, 437, 4, 575, 1477, 7933, 30192, 
218, 897, 267, 0, 11, 3740, 3348, 193, 118, 5, 737, 107951), 
    actes_2021 = c(18, 24055, 106, 13735, 5505, 8196, 3376, 352, 
    3035, 2571, 511, 8, 689, 1134, 6504, 42333, 161, 272, 138, 
    7, 0, 4682, 3226, 181, 82, 0, 61, 120938), sejours_2019 = c(4, 
    5493, 44, 2577, 2502, 1221, 852, 260, 1288, 540, 158, 0, 
    236, 397, 1631, 6992, 101, 63, 90, 1, 0, 1028, 1455, 148, 
    246, 1, 1820, 29148), sejours_2020 = c(2, 4946, 34, 2220, 
    1819, 1220, 574, 94, 801, 554, 140, 1, 221, 269, 1335, 5811, 
    79, 42, 58, 0, 1, 726, 1371, 109, 98, 5, 720, 23250), sejours_2021 = c(7, 
    5144, 21, 2523, 2416, 1451, 657, 111, 1106, 649, 162, 1, 
    278, 264, 1109, 7922, 69, 51, 30, 2, 0, 825, 1259, 108, 77, 
    0, 54, 26296)), row.names = c(4L, 5L, 6L, 7L, 8L, 9L, 10L, 
11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 
24L, 25L, 26L, 27L, 28L, 29L, 30L, 1L), core = structure(list(
    regroupement = c("Autres", "Ortho (+ rhumato et rachis)", 
    "Rachis", "Chirurgie digestive", "Ophtalmo", "Uro-néphro", 
    "Gynéco", "ORL Stomato sf bouche et dent", "bouche et dents", 
    "Tissus mou et chir plastique", "Chir thoracique et vasculaire", 
    "Chir thoracique", "Chir esth et hors sécu", "Divers chir", 
    "Gastro", "Endoscopies digestives", "Cardio Vasc (médecine)", 
    "Pneumologie", "Neurologie", "Soins palliatifs", "Vasculaire interventionnel", 
    "Divers médecine", "Accouchements", "Obstétrique autre (hors IVG)", 
    "IVG", "Néo nat", "Séances autres"), actes_2019 = c(10, 
    29520, 395, 14618, 5589, 6515, 4150, 866, 3458, 2137, 449, 
    0, 575, 2180, 9179, 36079, 311, 388, 714, 4, 0, 6024, 4028, 
    294, 292, 1, 1842), actes_2020 = c(8, 25451, 308, 12845, 
    4167, 7376, 2994, 337, 2206, 2107, 437, 4, 575, 1477, 7933, 
    30192, 218, 897, 267, 0, 11, 3740, 3348, 193, 118, 5, 737
    ), actes_2021 = c(18, 24055, 106, 13735, 5505, 8196, 3376, 
    352, 3035, 2571, 511, 8, 689, 1134, 6504, 42333, 161, 272, 
    138, 7, 0, 4682, 3226, 181, 82, 0, 61), sejours_2019 = c(4, 
    5493, 44, 2577, 2502, 1221, 852, 260, 1288, 540, 158, 0, 
    236, 397, 1631, 6992, 101, 63, 90, 1, 0, 1028, 1455, 148, 
    246, 1, 1820), sejours_2020 = c(2, 4946, 34, 2220, 1819, 
    1220, 574, 94, 801, 554, 140, 1, 221, 269, 1335, 5811, 79, 
    42, 58, 0, 1, 726, 1371, 109, 98, 5, 720), sejours_2021 = c(7, 
    5144, 21, 2523, 2416, 1451, 657, 111, 1106, 649, 162, 1, 
    278, 264, 1109, 7922, 69, 51, 30, 2, 0, 825, 1259, 108, 77, 
    0, 54)), class = "data.frame", row.names = 4:30), tabyl_type = "two_way", totals = 
"row", class = c("tabyl", 
"data.frame"))

例如，我计算了 2020 年至 2021 年间医学专业“Ortho”的行为数量下降对这两年之间行为数量总增长的演变的贡献：

25451/107951 * ((24055 - 25451)/25451)*100

我想在 2021-2020 和 2019-2021 期间为每个专业计算它，然后绘制一个条形图（不是堆叠的），就像这里完成的第二个：http: //www.statapprendre.education .fr/insee/croissance/pourquoi/graphique.htm

我认为一个 for 循环是可取的，但我真的不知道如何进行。 有人可以帮忙吗？

Answer 1

你可以做这样的事情。

library(tidyverse)
df %>%
    filter(regroupement != "Total") %>%
    filter(str_detect(regroupement, "(Ortho|dents|vasculaire|chir)")) %>%
    pivot_longer(starts_with("actes"), names_to = "year") %>%
    mutate(year = as.integer(str_remove(year, "actes_"))) %>%
    group_by(regroupement) %>%
    mutate(quantity_of_interest = value / sum(value) * c(NA, diff(value)) / value) %>%
    ungroup() %>%
    ggplot(aes(year, quantity_of_interest, fill = regroupement)) + 
    geom_col(position = "dodge") + 
    labs(x = "Year", y = "Quantity of interest relative to previous year") +
    theme(legend.position = "bottom")

解释：

删除“总计”行（将边际与原始数据混合从来都不是一个好/整洁的主意；而是即时计算边际）。
我随机选择了一些regroupement类别，因为将它们全部保留会使情节非常混乱（您有 28 个类别）。
actes_*列从宽转换为长，并通过删除"actes_"然后使用as.integer将名称转换为年份。
计算感兴趣的数量； 因为数据是长格式的，我们可以使用group_by和diff来很容易地计算这个value / sum(value) * c(NA, diff(value)) / value 。 请注意，由于我们按regroupement分组，因此sum(value)只是每个regroupement的总数。 如果您想要总计（我在这一点上不太清楚），则需要删除group_by()和ungroup()行。
绘制为带有position = "dodge"的柱状图。

计算变量对 R 中总增长的贡献

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-07-19 10:17:30

计算变量对 R 中总增长的贡献

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-07-19 10:17:30

解决方案1
1 已采纳 2022-07-19 10:17:30