[英]How to calculate the total contribution of a variable for 2 or more PCs in R (PCA)
[英]calculating the contribution of a variable to the growth of a total in R
我目前正在尝试计算变量对总数增长的贡献。 公式如下: 在 T - T' 时期: 变量 X 对总变量 Y 增长的贡献定义如下:
(Xt/Yt)*((Xt'-Xt)/Xt)*100
这是我的数据集:
structure(list(regroupement = c("Autres", "Ortho (+ rhumato et rachis)",
"Rachis", "Chirurgie digestive", "Ophtalmo", "Uro-néphro", "Gynéco",
"ORL Stomato sf bouche et dent", "bouche et dents", "Tissus mou et chir plastique",
"Chir thoracique et vasculaire", "Chir thoracique", "Chir esth et hors sécu",
"Divers chir", "Gastro", "Endoscopies digestives", "Cardio Vasc (médecine)",
"Pneumologie", "Neurologie", "Soins palliatifs", "Vasculaire interventionnel",
"Divers médecine", "Accouchements", "Obstétrique autre (hors IVG)",
"IVG", "Néo nat", "Séances autres", "Total"), actes_2019 = c(10,
29520, 395, 14618, 5589, 6515, 4150, 866, 3458, 2137, 449, 0,
575, 2180, 9179, 36079, 311, 388, 714, 4, 0, 6024, 4028, 294,
292, 1, 1842, 129618), actes_2020 = c(8, 25451, 308, 12845, 4167,
7376, 2994, 337, 2206, 2107, 437, 4, 575, 1477, 7933, 30192,
218, 897, 267, 0, 11, 3740, 3348, 193, 118, 5, 737, 107951),
actes_2021 = c(18, 24055, 106, 13735, 5505, 8196, 3376, 352,
3035, 2571, 511, 8, 689, 1134, 6504, 42333, 161, 272, 138,
7, 0, 4682, 3226, 181, 82, 0, 61, 120938), sejours_2019 = c(4,
5493, 44, 2577, 2502, 1221, 852, 260, 1288, 540, 158, 0,
236, 397, 1631, 6992, 101, 63, 90, 1, 0, 1028, 1455, 148,
246, 1, 1820, 29148), sejours_2020 = c(2, 4946, 34, 2220,
1819, 1220, 574, 94, 801, 554, 140, 1, 221, 269, 1335, 5811,
79, 42, 58, 0, 1, 726, 1371, 109, 98, 5, 720, 23250), sejours_2021 = c(7,
5144, 21, 2523, 2416, 1451, 657, 111, 1106, 649, 162, 1,
278, 264, 1109, 7922, 69, 51, 30, 2, 0, 825, 1259, 108, 77,
0, 54, 26296)), row.names = c(4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L,
24L, 25L, 26L, 27L, 28L, 29L, 30L, 1L), core = structure(list(
regroupement = c("Autres", "Ortho (+ rhumato et rachis)",
"Rachis", "Chirurgie digestive", "Ophtalmo", "Uro-néphro",
"Gynéco", "ORL Stomato sf bouche et dent", "bouche et dents",
"Tissus mou et chir plastique", "Chir thoracique et vasculaire",
"Chir thoracique", "Chir esth et hors sécu", "Divers chir",
"Gastro", "Endoscopies digestives", "Cardio Vasc (médecine)",
"Pneumologie", "Neurologie", "Soins palliatifs", "Vasculaire interventionnel",
"Divers médecine", "Accouchements", "Obstétrique autre (hors IVG)",
"IVG", "Néo nat", "Séances autres"), actes_2019 = c(10,
29520, 395, 14618, 5589, 6515, 4150, 866, 3458, 2137, 449,
0, 575, 2180, 9179, 36079, 311, 388, 714, 4, 0, 6024, 4028,
294, 292, 1, 1842), actes_2020 = c(8, 25451, 308, 12845,
4167, 7376, 2994, 337, 2206, 2107, 437, 4, 575, 1477, 7933,
30192, 218, 897, 267, 0, 11, 3740, 3348, 193, 118, 5, 737
), actes_2021 = c(18, 24055, 106, 13735, 5505, 8196, 3376,
352, 3035, 2571, 511, 8, 689, 1134, 6504, 42333, 161, 272,
138, 7, 0, 4682, 3226, 181, 82, 0, 61), sejours_2019 = c(4,
5493, 44, 2577, 2502, 1221, 852, 260, 1288, 540, 158, 0,
236, 397, 1631, 6992, 101, 63, 90, 1, 0, 1028, 1455, 148,
246, 1, 1820), sejours_2020 = c(2, 4946, 34, 2220, 1819,
1220, 574, 94, 801, 554, 140, 1, 221, 269, 1335, 5811, 79,
42, 58, 0, 1, 726, 1371, 109, 98, 5, 720), sejours_2021 = c(7,
5144, 21, 2523, 2416, 1451, 657, 111, 1106, 649, 162, 1,
278, 264, 1109, 7922, 69, 51, 30, 2, 0, 825, 1259, 108, 77,
0, 54)), class = "data.frame", row.names = 4:30), tabyl_type = "two_way", totals =
"row", class = c("tabyl",
"data.frame"))
例如,我计算了 2020 年至 2021 年间医学专业“Ortho”的行为数量下降对这两年之间行为数量总增长的演变的贡献:
25451/107951 * ((24055 - 25451)/25451)*100
我想在 2021-2020 和 2019-2021 期间为每个专业计算它,然后绘制一个条形图(不是堆叠的),就像这里完成的第二个:http: //www.statapprendre.education .fr/insee/croissance/pourquoi/graphique.htm
我认为一个 for 循环是可取的,但我真的不知道如何进行。 有人可以帮忙吗?
你可以做这样的事情。
library(tidyverse)
df %>%
filter(regroupement != "Total") %>%
filter(str_detect(regroupement, "(Ortho|dents|vasculaire|chir)")) %>%
pivot_longer(starts_with("actes"), names_to = "year") %>%
mutate(year = as.integer(str_remove(year, "actes_"))) %>%
group_by(regroupement) %>%
mutate(quantity_of_interest = value / sum(value) * c(NA, diff(value)) / value) %>%
ungroup() %>%
ggplot(aes(year, quantity_of_interest, fill = regroupement)) +
geom_col(position = "dodge") +
labs(x = "Year", y = "Quantity of interest relative to previous year") +
theme(legend.position = "bottom")
解释:
regroupement
类别,因为将它们全部保留会使情节非常混乱(您有 28 个类别)。actes_*
列从宽转换为长,并通过删除"actes_"
然后使用as.integer
将名称转换为年份。group_by
和diff
来很容易地计算这个value / sum(value) * c(NA, diff(value)) / value
。 请注意,由于我们按regroupement
分组,因此sum(value)
只是每个regroupement
的总数。 如果您想要总计(我在这一点上不太清楚),则需要删除group_by()
和ungroup()
行。position = "dodge"
的柱状图。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.