I'm trying to clean up some data ( https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv ) regarding the COVID19 Novel Coronavirus to do various types of analysis (ie. create a chart of countries with 100 cases over time, or track the death-rate over time per country). I used data which had the dates as columns and countries as rows. I transposed the Dataframe so that I got a column for each country and a single column of dates as shown below.
I have attempted to read this dataframe in as a time series object through the following code:
covid19ts = ts(covid19, frequency = 365, start = c(2020,22))
The result is the following. Instead of getting dates as my index column I get a number from 1 - 47 (the number of days recorded). This results in me being unable to create charts or do any meaningful analysis.
I have also tried the following code using the lubridate package with the same results:
covid19ts = ts(covid19, frequency = 365, start= decimal_date(as.Date("2020-01-22")))
How can I make my ts dates into the actual dates for charting and analysis?
Or is there a completely different approach I could be using which would be better for the analysis im trying to do?
Thank you for your help.
You could keep the data as a dataframe and do useful plotting. Maybe get the data in long format.
library(tidyverse)
df <- read.csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv', check.names = FALSE)
df1 <- df %>% pivot_longer(cols = -(1:4))
head(df1)
# A tibble: 6 x 6
# `Province/State` `Country/Region` Lat Long name value
# <fct> <fct> <dbl> <dbl> <chr> <int>
#1 Anhui Mainland China 31.8 117. 1/22/20 1
#2 Anhui Mainland China 31.8 117. 1/23/20 9
#3 Anhui Mainland China 31.8 117. 1/24/20 15
#4 Anhui Mainland China 31.8 117. 1/25/20 39
#5 Anhui Mainland China 31.8 117. 1/26/20 60
#6 Anhui Mainland China 31.8 117. 1/27/20 70
If you want to convert the data into time-series as shown in your post, you could do :
df2 <- df1 %>%
group_by(`Country/Region`, name) %>%
summarise(value = sum(value)) %>%
pivot_wider(names_from = `Country/Region`, values_from = value,
values_fill = list(value = 0))
ts_data <- xts::xts(df2[-1], as.Date(df2$name, "%m/%d/%y"))
An alternative solution suggested by @G. Grothendieck relying on zoo
is
z <- read.zoo(df1[c(2, 5:6)], index = "name", split = "Country/Region",
format = "%m/%d/%Y", aggregate = sum)
read.zoo
avoids all the explicit aggregating and reshaping by tidyverse
. We can then use autoplot
function to plot this zoo
object.
Rather than use ts
or xts
objects, this is best suited to a tsibble
format like this.
library(tidyverse)
library(tsibble)
library(feasts)
covid19 <- read_csv("time_series_19-covid-Confirmed.csv") %>%
pivot_longer(cols = -(1:4)) %>%
mutate(date = lubridate::mdy(name)) %>%
select(-name) %>%
rename(
"Region" = `Province/State`,
"Country" = `Country/Region`
) %>%
as_tsibble(key = c(Region, Country), index = date)
# Plot by country
covid19 %>%
filter(Country %in% c("China", "Italy", "Iran", "South Korea")) %>%
group_by(Country) %>%
summarise(value = sum(value)) %>%
autoplot(value)
Created on 2020-03-09 by the reprex package (v0.3.0)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.