简体   繁体   English

为箱线图旋转数据集并计算变量的演变

[英]pivoting a dataset for boxplot and computing evolution of a variable

I am trying to code a plot with 3 boxplots, one per year (N2, N1 and N).我正在尝试用 3 个箱线图编写 plot,每年一个(N2、N1 和 N)。 I want to plot variable "DMS".我想 plot 变量“DMS”。

library(dplyr)
library(ggplot2)

mean_DMS_2021 <- NCN_dataset %>%
   filter(type_de_sejour == "Hospitalisé")%>%
   group_by(code_2)%>%
   summarise(mean_DMS_hosp_N = mean(as.numeric(DMS_hosp_N)))

mean_DMS_2020 <- NCN_dataset %>%
  filter(type_de_sejour == "Hospitalisé")%>%
  group_by(code_2)%>%
  summarise(mean_DMS_hosp_N1 = mean(as.numeric(DMS_hosp_N1)))

mean_DMS_2019 <- NCN_dataset %>%
  filter(type_de_sejour == "Hospitalisé")%>%
  group_by(code_2)%>%
  summarise(mean_DMS_hosp_N2 = mean(as.numeric(DMS_hosp_N2)))

mean_DMS_hosp_19_20 <- as.data.frame(merge(mean_DMS_2019, mean_DMS_2020))
mean_DMS_hosp <- as.data.frame(merge(mean_DMS_hosp_19_20, mean_DMS_2021))
View(mean_DMS_hosp)

# I here show the mean of DMS for each year, all specialties together. 

boxplot(mean_DMS_hosp$mean_DMS_hosp_N2)
boxplot(mean_DMS_hosp$mean_DMS_hosp_N1)
boxplot(mean_DMS_hosp$mean_DMS_hosp_N)

I tried to pivot my data because ideally I would like to plot, per year, a boxplot for variable DMS, in order to see the evolution throughout the years of the distribution of this variable, and then facet (or anything else to be able to show the differences between specialties) per specialty or per "nom": either keeping only the specialties with the highest variance between the observations for the variable DMS, or filtering to keep only the DMS that have evolved the most throughout the years (I guess I would have to create several variables of "evolution" showing the evolution of DMS between N2 and N1, between N1 and N, and the average annual growth rate between N2 and N)).我试图 pivot 我的数据,因为理想情况下我想 plot,每年,一个用于变量 DMS 的箱线图,以便查看多年来这个变量分布的演变,然后是方面(或任何其他能够显示专业之间的差异)每个专业或每个“名义”:要么只保留变量 DMS 的观测值之间差异最大的专业,要么过滤以仅保留多年来发展最多的 DMS(我想我将不得不创建几个“进化”变量,显示 DMS 在 N2 和 N1 之间、N1 和 N 之间的演变,以及 N2 和 N 之间的平均年增长率))。

 library(data.table)

 NCN_dataset_long  <- melt(data.table::setDT(NCN_dataset), 
                          measure.vars=list(c(4,5,6), c(7,8,9) , c(10,11,12)),
                          variable.name='time_year', 
                      value.name=c('sejour', 'CA',"DMS"))[,
                                                          time_year:= 
 paste0('N',time_year)][order(nom,type_de_sejour, code_2, site)]

Here is the structure of my dataset, for reproducibility:这是我的数据集的结构,为了重现性:

setDT(structure(list(nom = c("CHRISTOPHE", "CHRISTOPHE", 
"PABLO", "JEAN-MARC", "YVES", 
"GUILLAUME"), type_de_sejour = c("Ambulatoires", 
"Externes", "Ambulatoires", "Ambulatoires", "Ambulatoires", "Ambulatoires"
), code_2 = c("Ortho", "Ortho", "Neuro Chir", "Cardio", "Radio", 
"ARE"), sejours_N2 = c(1046, 0, 4, 6, 4, 4), sejours_N1 = c(1001, 
1, 77, 26, 9, 1), sejours_N = c(1078, 0, 115, 140, 9, 1), CA_N2 = c(609862, 
0, 2002, 3296, 1457, 1253), CA_N1 = c(597436, 24, 119573, 22098, 
3026, 322), CA_N = c(668426, 0, 196852, 134095, 3454, 345), DMS_hosp_N2 = c("0", 
"0", "0", "0", "0", "0"), DMS_hosp_N1 = c("0", "0", "0", "0", 
"0", "0"), DMS_hosp_N = c("0", "0", "0", "0", "0", "0"), site = c("PGS", 
"PGS", "FRA", "FRA", "PGS", "FRA")), row.names = c(NA, -6L), class = c("data.table", 
"data.frame")))

I'll use CA here since your DMS is all zero.我将在这里使用CA ,因为您的DMS都是零。

library(ggplot2)
library(tidyr) # pivot_longer
pivot_longer(NCN_dataset, -c(nom, type_de_sejour, code_2, site), names_pattern = "(.*)_(N.*)", names_to = c(".value", "time_year")) |>
  ggplot(aes(x=time_year, y=CA)) +
  geom_boxplot()

ggplot 箱线图

You control orientation by using either x= or y= aesthetics for the numeric property.可以通过对数字属性使用x=y=美学来控制方向。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM