简体   繁体   中英

How to plot % positive cases (y-axis) by collection date (x-axis) and by other factors (R)?

Please help! I have case data I need to prepare for a report soon and just cannot get the graphs to display properly.

From a dataset with CollectionDate as the "record" of cases (ie multiple rows with the same date means more cases that day), I want to display Number of positive cases/total (positive + negative) cases for that day as a percent on the y-axis, with collection dates along the x-axis. Then I want to break down by region. Goal is to look like this but in terms of daily positives/# of tests rather than just positives vs negatives. I also want to add a horizontal line on every graph at 20%.

  • I have tried manipulating it before, in and after ggplot:
    ggplot(df_final, aes(x =CollectionDate, fill = TestResult)) +
    geom_bar(aes(y=..prop..)) +
    scale_y_continuous(labels=percent_format())

Which is, again, close. But the percents are wrong because they are just taking the proportion of that day against counts of all days instead of per day .

Then I tried using tally() in the following command to try and count per region and aggregate:

  df_final %>% 
  group_by(CollectionDate, Region, as.factor(TestResult)) %>% 
  filter(TestResult == "Positive") %>%
  tally()

and I still cannot get the graphs right. Suggestions?

A quick look at my data:

head(df_final)

I can get you halfway there (refer to the comments in the code for clarifications). This code is for the counts per day per region (plotted separately for each region). I think you can tweak things further to calculate the counts per day per county too; and whole state should be a cakewalk. I wish you good luck with your report.

rm(list = ls())

library(dplyr)
library(magrittr)
library(ggplot2)
library(scales)
library(tidyr) #Needed for the spread() function

#Dummy data
set.seed(1984)

sdate <- as.Date('2000-03-09')  
edate <- as.Date('2000-05-18')
dateslist <- as.Date(sample(as.numeric(sdate): as.numeric(edate), 10000, replace = TRUE), origin = '1970-01-01')

df_final <- data.frame(Region = rep_len(1:9, 10000), 
                 CollectionDate = dateslist, 
                 TestResult = sample(c("Positive", "Negative"), 10000, replace = TRUE))


#First tally the positve and negative cases
#by Region, CollectionDate, TestResult in that order
df_final %<>% 
  group_by(Region, CollectionDate, TestResult) %>%
  tally()


#Then
#First spread the counts (in n)
#That is, create separate columns for Negative and Positive cases
#for each Region-CollectionDate combination
#Then calculate their proportions (as shown)
#Now you have Negative and Positive 
#percentages by CollectionDate by Region
df_final %<>% 
  spread(key = TestResult, value = n) %>% 
  mutate(Negative = Negative/(Negative + Positive), 
         Positive = Positive/(Negative + Positive))



#Plotting this now
#Since the percentages are available already
#Use geom_col() instead of geom_bar()
df_final %>% ggplot() + 
  geom_col(aes(x = CollectionDate, y = Positive, fill = "Positive"), 
           position = "identity", alpha = 0.4) + 
  geom_col(aes(x = CollectionDate, y = Negative, fill = "Negative"), 
           position = "identity", alpha = 0.4) +
  facet_wrap(~ Region, nrow = 3, ncol = 3)

This yields: 绘图

Well, I have to say that I am not 100% sure that I got what you want, but anyway, this can be helpful.

The data: Since you are new here, I have to let you know that using a simple and reproducible version of your data will make it easier to the rest of us to answer. To do this you can simulate a data frame o any other objec, or use dput function on it.

library(ggplot2)
library(dplyr)

data <- data.frame(
    # date
    CollectionDate = sample(
        seq(as.Date("2020-01-01"), by = "day", length.out = 15),
        size = 120, replace = TRUE),
    # result
    TestResult = sample(c("Positive", "Negative"), size = 120, replace = TRUE),
    # region
    Region = sample(c("Region 1", "Region2"), size = 120, replace = TRUE)
)

With this data, you can do ass follow to get the plots you want.

# General plot, positive cases proportion
data %>% 
    count(CollectionDate, TestResult, name = "cases") %>% 
    group_by(CollectionDate) %>% 
    summarise(positive_pro = sum(cases[TestResult == "Positive"])/sum(cases)) %>% 
    ggplot(aes(x = CollectionDate, y = positive_pro)) +
    geom_col() +
    geom_hline(yintercept = 0.2)  

在此处输入图像描述

#  positive proportion by day within region
 data %>% 
    count(CollectionDate, TestResult, Region, name = "cases") %>% 
    group_by(CollectionDate, Region) %>% 
    summarise(
        positive_pro = sum(cases[TestResult == "Positive"])/sum(cases)
    ) %>% 
    ggplot(aes(x = CollectionDate, y = positive_pro)) +
    geom_col() +
    # horizontal line at 20%
    geom_hline(yintercept = 0.2) +
    facet_wrap(~Region)

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM