繁体   English   中英

计算变量对 R 中总增长的贡献

[英]calculating the contribution of a variable to the growth of a total in R

我目前正在尝试计算变量对总数增长的贡献。 公式如下: 在 T - T' 时期: 变量 X 对总变量 Y 增长的贡献定义如下:

(Xt/Yt)*((Xt'-Xt)/Xt)*100

这是我的数据集:

 structure(list(regroupement = c("Autres", "Ortho (+ rhumato et rachis)", 
"Rachis", "Chirurgie digestive", "Ophtalmo", "Uro-néphro", "Gynéco", 
"ORL Stomato sf bouche et dent", "bouche et dents", "Tissus mou et chir plastique", 
"Chir thoracique et vasculaire", "Chir thoracique", "Chir esth et hors sécu", 
"Divers chir", "Gastro", "Endoscopies digestives", "Cardio Vasc (médecine)", 
"Pneumologie", "Neurologie", "Soins palliatifs", "Vasculaire interventionnel", 
"Divers médecine", "Accouchements", "Obstétrique autre (hors IVG)", 
"IVG", "Néo nat", "Séances autres", "Total"), actes_2019 = c(10, 
29520, 395, 14618, 5589, 6515, 4150, 866, 3458, 2137, 449, 0, 
575, 2180, 9179, 36079, 311, 388, 714, 4, 0, 6024, 4028, 294, 
292, 1, 1842, 129618), actes_2020 = c(8, 25451, 308, 12845, 4167, 
7376, 2994, 337, 2206, 2107, 437, 4, 575, 1477, 7933, 30192, 
218, 897, 267, 0, 11, 3740, 3348, 193, 118, 5, 737, 107951), 
    actes_2021 = c(18, 24055, 106, 13735, 5505, 8196, 3376, 352, 
    3035, 2571, 511, 8, 689, 1134, 6504, 42333, 161, 272, 138, 
    7, 0, 4682, 3226, 181, 82, 0, 61, 120938), sejours_2019 = c(4, 
    5493, 44, 2577, 2502, 1221, 852, 260, 1288, 540, 158, 0, 
    236, 397, 1631, 6992, 101, 63, 90, 1, 0, 1028, 1455, 148, 
    246, 1, 1820, 29148), sejours_2020 = c(2, 4946, 34, 2220, 
    1819, 1220, 574, 94, 801, 554, 140, 1, 221, 269, 1335, 5811, 
    79, 42, 58, 0, 1, 726, 1371, 109, 98, 5, 720, 23250), sejours_2021 = c(7, 
    5144, 21, 2523, 2416, 1451, 657, 111, 1106, 649, 162, 1, 
    278, 264, 1109, 7922, 69, 51, 30, 2, 0, 825, 1259, 108, 77, 
    0, 54, 26296)), row.names = c(4L, 5L, 6L, 7L, 8L, 9L, 10L, 
11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 
24L, 25L, 26L, 27L, 28L, 29L, 30L, 1L), core = structure(list(
    regroupement = c("Autres", "Ortho (+ rhumato et rachis)", 
    "Rachis", "Chirurgie digestive", "Ophtalmo", "Uro-néphro", 
    "Gynéco", "ORL Stomato sf bouche et dent", "bouche et dents", 
    "Tissus mou et chir plastique", "Chir thoracique et vasculaire", 
    "Chir thoracique", "Chir esth et hors sécu", "Divers chir", 
    "Gastro", "Endoscopies digestives", "Cardio Vasc (médecine)", 
    "Pneumologie", "Neurologie", "Soins palliatifs", "Vasculaire interventionnel", 
    "Divers médecine", "Accouchements", "Obstétrique autre (hors IVG)", 
    "IVG", "Néo nat", "Séances autres"), actes_2019 = c(10, 
    29520, 395, 14618, 5589, 6515, 4150, 866, 3458, 2137, 449, 
    0, 575, 2180, 9179, 36079, 311, 388, 714, 4, 0, 6024, 4028, 
    294, 292, 1, 1842), actes_2020 = c(8, 25451, 308, 12845, 
    4167, 7376, 2994, 337, 2206, 2107, 437, 4, 575, 1477, 7933, 
    30192, 218, 897, 267, 0, 11, 3740, 3348, 193, 118, 5, 737
    ), actes_2021 = c(18, 24055, 106, 13735, 5505, 8196, 3376, 
    352, 3035, 2571, 511, 8, 689, 1134, 6504, 42333, 161, 272, 
    138, 7, 0, 4682, 3226, 181, 82, 0, 61), sejours_2019 = c(4, 
    5493, 44, 2577, 2502, 1221, 852, 260, 1288, 540, 158, 0, 
    236, 397, 1631, 6992, 101, 63, 90, 1, 0, 1028, 1455, 148, 
    246, 1, 1820), sejours_2020 = c(2, 4946, 34, 2220, 1819, 
    1220, 574, 94, 801, 554, 140, 1, 221, 269, 1335, 5811, 79, 
    42, 58, 0, 1, 726, 1371, 109, 98, 5, 720), sejours_2021 = c(7, 
    5144, 21, 2523, 2416, 1451, 657, 111, 1106, 649, 162, 1, 
    278, 264, 1109, 7922, 69, 51, 30, 2, 0, 825, 1259, 108, 77, 
    0, 54)), class = "data.frame", row.names = 4:30), tabyl_type = "two_way", totals = 
"row", class = c("tabyl", 
"data.frame"))

例如,我计算了 2020 年至 2021 年间医学专业“Ortho”的行为数量下降对这两年之间行为数量总增长的演变的贡献:

25451/107951 * ((24055 - 25451)/25451)*100

我想在 2021-2020 和 2019-2021 期间为每个专业计算它,然后绘制一个条形图(不是堆叠的),就像这里完成的第二个:http: //www.statapprendre.education .fr/insee/croissance/pourquoi/graphique.htm

我认为一个 for 循环是可取的,但我真的不知道如何进行。 有人可以帮忙吗?

你可以做这样的事情。

library(tidyverse)
df %>%
    filter(regroupement != "Total") %>%
    filter(str_detect(regroupement, "(Ortho|dents|vasculaire|chir)")) %>%
    pivot_longer(starts_with("actes"), names_to = "year") %>%
    mutate(year = as.integer(str_remove(year, "actes_"))) %>%
    group_by(regroupement) %>%
    mutate(quantity_of_interest = value / sum(value) * c(NA, diff(value)) / value) %>%
    ungroup() %>%
    ggplot(aes(year, quantity_of_interest, fill = regroupement)) + 
    geom_col(position = "dodge") + 
    labs(x = "Year", y = "Quantity of interest relative to previous year") +
    theme(legend.position = "bottom")

在此处输入图像描述

解释:

  1. 删除“总计”行(将边际与原始数据混合从来都不是一个好/整洁的主意;而是即时计算边际)。
  2. 我随机选择了一些regroupement类别,因为将它们全部保留会使情节非常混乱(您有 28 个类别)。
  3. actes_*列从宽转换为长,并通过删除"actes_"然后使用as.integer将名称转换为年份。
  4. 计算感兴趣的数量; 因为数据是长格式的,我们可以使用group_bydiff来很容易地计算这个value / sum(value) * c(NA, diff(value)) / value 请注意,由于我们按regroupement分组,因此sum(value)只是每个regroupement的总数。 如果您想要总计(我在这一点上不太清楚),则需要删除group_by()ungroup()行。
  5. 绘制为带有position = "dodge"的柱状图。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM