[英]How to plot % positive cases (y-axis) by collection date (x-axis) and by other factors (R)?
Please help!请帮忙! I have case data I need to prepare for a report soon and just cannot get the graphs to display properly.
我有需要尽快准备报告的案例数据,但无法正确显示图表。
From a dataset with CollectionDate as the "record" of cases (ie multiple rows with the same date means more cases that day), I want to display Number of positive cases/total (positive + negative) cases for that day as a percent on the y-axis, with collection dates along the x-axis.从以 CollectionDate 作为案例“记录”的数据集(即具有相同日期的多行意味着当天更多案例),我想显示当天的阳性病例数/总(阳性 + 阴性)病例数作为百分比y 轴,收集日期沿 x 轴。 Then I want to break down by region.
然后我想按地区细分。 Goal is to look like this but in terms of daily positives/# of tests rather than just positives vs negatives.
目标是看起来像这样,但根据每日阳性/测试次数,而不仅仅是阳性与阴性。 I also want to add a horizontal line on every graph at 20%.
我还想在每个图表上添加一条 20% 的水平线。
ggplot(df_final, aes(x =CollectionDate, fill = TestResult)) +
geom_bar(aes(y=..prop..)) +
scale_y_continuous(labels=percent_format())
Which is, again, close.这是,再次,关闭。 But the percents are wrong because they are just taking the proportion of that day against counts of all days instead of per day .
但是百分比是错误的,因为它们只是将当天的比例与所有天数而不是每天数相比较。
Then I tried using tally()
in the following command to try and count per region and aggregate:然后我尝试在以下命令中使用
tally()
来尝试按区域计数并聚合:
df_final %>%
group_by(CollectionDate, Region, as.factor(TestResult)) %>%
filter(TestResult == "Positive") %>%
tally()
and I still cannot get the graphs right.我仍然无法正确绘制图表。 Suggestions?
建议?
A quick look at my data:快速浏览我的数据:
head(df_final)
I can get you halfway there (refer to the comments in the code for clarifications).我可以让你走到一半(请参阅代码中的注释以进行澄清)。 This code is for the counts per day per region (plotted separately for each region).
此代码用于每个区域每天的计数(为每个区域单独绘制)。 I think you can tweak things further to calculate the counts per day per county too;
我认为您也可以进一步调整以计算每个县每天的计数; and whole state should be a cakewalk.
整个 state 应该是小菜一碟。 I wish you good luck with your report.
祝你的报告好运。
rm(list = ls())
library(dplyr)
library(magrittr)
library(ggplot2)
library(scales)
library(tidyr) #Needed for the spread() function
#Dummy data
set.seed(1984)
sdate <- as.Date('2000-03-09')
edate <- as.Date('2000-05-18')
dateslist <- as.Date(sample(as.numeric(sdate): as.numeric(edate), 10000, replace = TRUE), origin = '1970-01-01')
df_final <- data.frame(Region = rep_len(1:9, 10000),
CollectionDate = dateslist,
TestResult = sample(c("Positive", "Negative"), 10000, replace = TRUE))
#First tally the positve and negative cases
#by Region, CollectionDate, TestResult in that order
df_final %<>%
group_by(Region, CollectionDate, TestResult) %>%
tally()
#Then
#First spread the counts (in n)
#That is, create separate columns for Negative and Positive cases
#for each Region-CollectionDate combination
#Then calculate their proportions (as shown)
#Now you have Negative and Positive
#percentages by CollectionDate by Region
df_final %<>%
spread(key = TestResult, value = n) %>%
mutate(Negative = Negative/(Negative + Positive),
Positive = Positive/(Negative + Positive))
#Plotting this now
#Since the percentages are available already
#Use geom_col() instead of geom_bar()
df_final %>% ggplot() +
geom_col(aes(x = CollectionDate, y = Positive, fill = "Positive"),
position = "identity", alpha = 0.4) +
geom_col(aes(x = CollectionDate, y = Negative, fill = "Negative"),
position = "identity", alpha = 0.4) +
facet_wrap(~ Region, nrow = 3, ncol = 3)
Well, I have to say that I am not 100% sure that I got what you want, but anyway, this can be helpful.好吧,我不得不说我不是 100% 确定我得到了你想要的,但无论如何,这可能会有所帮助。
The data: Since you are new here, I have to let you know that using a simple and reproducible version of your data will make it easier to the rest of us to answer.数据:由于您是新来的,我必须让您知道,使用您的数据的简单且可重复的版本将使我们的 rest 更容易回答。 To do this you can simulate a data frame o any other objec, or use dput function on it.
为此,您可以模拟任何其他对象的数据框,或在其上使用 dput function。
library(ggplot2)
library(dplyr)
data <- data.frame(
# date
CollectionDate = sample(
seq(as.Date("2020-01-01"), by = "day", length.out = 15),
size = 120, replace = TRUE),
# result
TestResult = sample(c("Positive", "Negative"), size = 120, replace = TRUE),
# region
Region = sample(c("Region 1", "Region2"), size = 120, replace = TRUE)
)
With this data, you can do ass follow to get the plots you want.有了这些数据,你就可以跟着做得到你想要的图。
# General plot, positive cases proportion
data %>%
count(CollectionDate, TestResult, name = "cases") %>%
group_by(CollectionDate) %>%
summarise(positive_pro = sum(cases[TestResult == "Positive"])/sum(cases)) %>%
ggplot(aes(x = CollectionDate, y = positive_pro)) +
geom_col() +
geom_hline(yintercept = 0.2)
# positive proportion by day within region
data %>%
count(CollectionDate, TestResult, Region, name = "cases") %>%
group_by(CollectionDate, Region) %>%
summarise(
positive_pro = sum(cases[TestResult == "Positive"])/sum(cases)
) %>%
ggplot(aes(x = CollectionDate, y = positive_pro)) +
geom_col() +
# horizontal line at 20%
geom_hline(yintercept = 0.2) +
facet_wrap(~Region)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.