[英]Time series analysis of diabetic data
我有一個看起來像這樣的數據集
data=
**ID HbA1cRes Year**
1 65 2003
2 125 2008
3 40 2010
4 110 2007
5 125 2006
6 136 2011
7 20 2012
8 58 2009
9 12 2006
10 123 2008
HbA1cRes > 65 的患者被歸類為“高風險”,以下被歸類為“低風險”。 我正在嘗試使用以下代碼進行時間序列分析(以查看高風險和低風險案例隨時間的上升和下降)和 Year <- data$REport_YrMonth
library(tidyverse)
data$risk <- factor( ifelse( data$HbA1cRes > 65 ,"High risk patients", "Low risk patients") )
ggplot(data, aes(x=Year)) +
geom_line(aes(y=risk)) +
labs(title="Analysis of diabetes' patients status over time",
y="Returns %")
但是,返回的 output 如下:
猜猜我在這里做錯了什么?
計算每年有多少“高風險患者”和“低風險患者”,然后計算Year
數據。
library(ggplot2)
library(dplyr)
data %>%
mutate(risk = factor(ifelse(HbA1cRes > 65 ,
"High risk patients", "Low risk patients"))) %>%
count(Year, risk) %>%
ggplot(aes(x=Year, y = n, color = risk)) +
geom_line() +
labs(title="Analysis of diabetes' patients status over time")
case_when function 可能是數據分類的優雅解決方案。
geom_col 或 geom_density 可能會提供更好的選擇,而不是 geom_line。
df <- tibble(
id = 1:10,
hb = c(65,125,40,110,125,136,20,58,12,123),
year = c(2003,2008,2010,2007,2006,2011,2012,2009,2006,2008)
)
df <- df %>%
mutate(
risk = case_when(
hb > 65 ~"high risk",
TRUE ~"low risk"
)
) %>%
count(
year,
risk
)
df %>%
ggplot(aes(x=year, y = n, group = risk, fill = risk)) +
geom_col(position = "dodge") +
labs(
title="Analysis of diabetes' patients status over time",
y="Returns %",
fill = "Risk Status")
df %>%
ggplot(aes(x=year, fill = risk)) +
geom_density(position = "fill") +
labs(
title="Analysis of diabetes' patients status over time",
y="Returns %",
fill = "Risk Status")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.