[英]Time series analysis of diabetic data
我有一个看起来像这样的数据集
data=
**ID HbA1cRes Year**
1 65 2003
2 125 2008
3 40 2010
4 110 2007
5 125 2006
6 136 2011
7 20 2012
8 58 2009
9 12 2006
10 123 2008
HbA1cRes > 65 的患者被归类为“高风险”,以下被归类为“低风险”。 我正在尝试使用以下代码进行时间序列分析(以查看高风险和低风险案例随时间的上升和下降)和 Year <- data$REport_YrMonth
library(tidyverse)
data$risk <- factor( ifelse( data$HbA1cRes > 65 ,"High risk patients", "Low risk patients") )
ggplot(data, aes(x=Year)) +
geom_line(aes(y=risk)) +
labs(title="Analysis of diabetes' patients status over time",
y="Returns %")
但是,返回的 output 如下:
猜猜我在这里做错了什么?
计算每年有多少“高风险患者”和“低风险患者”,然后计算Year
数据。
library(ggplot2)
library(dplyr)
data %>%
mutate(risk = factor(ifelse(HbA1cRes > 65 ,
"High risk patients", "Low risk patients"))) %>%
count(Year, risk) %>%
ggplot(aes(x=Year, y = n, color = risk)) +
geom_line() +
labs(title="Analysis of diabetes' patients status over time")
case_when function 可能是数据分类的优雅解决方案。
geom_col 或 geom_density 可能会提供更好的选择,而不是 geom_line。
df <- tibble(
id = 1:10,
hb = c(65,125,40,110,125,136,20,58,12,123),
year = c(2003,2008,2010,2007,2006,2011,2012,2009,2006,2008)
)
df <- df %>%
mutate(
risk = case_when(
hb > 65 ~"high risk",
TRUE ~"low risk"
)
) %>%
count(
year,
risk
)
df %>%
ggplot(aes(x=year, y = n, group = risk, fill = risk)) +
geom_col(position = "dodge") +
labs(
title="Analysis of diabetes' patients status over time",
y="Returns %",
fill = "Risk Status")
df %>%
ggplot(aes(x=year, fill = risk)) +
geom_density(position = "fill") +
labs(
title="Analysis of diabetes' patients status over time",
y="Returns %",
fill = "Risk Status")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.