How can I convert this dataframe into a multiple time series object in R?

Question

I'm trying to clean up some data ( https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv ) regarding the COVID19 Novel Coronavirus to do various types of analysis (ie. create a chart of countries with 100 cases over time, or track the death-rate over time per country). I used data which had the dates as columns and countries as rows. I transposed the Dataframe so that I got a column for each country and a single column of dates as shown below.

I have attempted to read this dataframe in as a time series object through the following code:

covid19ts = ts(covid19, frequency = 365, start = c(2020,22))

The result is the following. Instead of getting dates as my index column I get a number from 1 - 47 (the number of days recorded). This results in me being unable to create charts or do any meaningful analysis.

I have also tried the following code using the lubridate package with the same results:

covid19ts = ts(covid19, frequency = 365, start= decimal_date(as.Date("2020-01-22")))

How can I make my ts dates into the actual dates for charting and analysis?

Or is there a completely different approach I could be using which would be better for the analysis im trying to do?

Thank you for your help.

Answer 1

You could keep the data as a dataframe and do useful plotting. Maybe get the data in long format.

library(tidyverse)
df <- read.csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv', check.names = FALSE)
df1 <- df %>% pivot_longer(cols = -(1:4)) 
head(df1)

# A tibble: 6 x 6
#  `Province/State` `Country/Region`   Lat  Long name    value
#  <fct>            <fct>            <dbl> <dbl> <chr>   <int>
#1 Anhui            Mainland China    31.8  117. 1/22/20     1
#2 Anhui            Mainland China    31.8  117. 1/23/20     9
#3 Anhui            Mainland China    31.8  117. 1/24/20    15
#4 Anhui            Mainland China    31.8  117. 1/25/20    39
#5 Anhui            Mainland China    31.8  117. 1/26/20    60
#6 Anhui            Mainland China    31.8  117. 1/27/20    70

If you want to convert the data into time-series as shown in your post, you could do :

df2 <- df1 %>%
         group_by(`Country/Region`, name) %>%
         summarise(value = sum(value)) %>%
         pivot_wider(names_from = `Country/Region`, values_from = value, 
         values_fill = list(value = 0))

ts_data <- xts::xts(df2[-1], as.Date(df2$name, "%m/%d/%y"))

An alternative solution suggested by @G. Grothendieck relying on zoo is

z <- read.zoo(df1[c(2, 5:6)], index = "name", split = "Country/Region", 
              format = "%m/%d/%Y", aggregate = sum)

read.zoo avoids all the explicit aggregating and reshaping by tidyverse . We can then use autoplot function to plot this zoo object.

Answer 2

Rather than use ts or xts objects, this is best suited to a tsibble format like this.

library(tidyverse)
library(tsibble)
library(feasts)

covid19 <- read_csv("time_series_19-covid-Confirmed.csv") %>%
  pivot_longer(cols = -(1:4)) %>%
  mutate(date = lubridate::mdy(name)) %>%
  select(-name) %>%
  rename(
    "Region" = `Province/State`,
    "Country" = `Country/Region`
  ) %>%
  as_tsibble(key = c(Region, Country), index = date)

# Plot by country
covid19 %>%
  filter(Country %in% c("China", "Italy", "Iran", "South Korea")) %>%
  group_by(Country) %>%
  summarise(value = sum(value)) %>%
  autoplot(value)

^{Created on 2020-03-09 by the reprex package (v0.3.0)}

How can I convert this dataframe into a multiple time series object in R?

Question

2 answers

solution1
3 2020-03-09 06:01:32

solution2
1 2020-03-09 06:54:37

How can I convert this dataframe into a multiple time series object in R?

Question

2 answers

solution1 3 2020-03-09 06:01:32

solution2 1 2020-03-09 06:54:37

solution1
3 2020-03-09 06:01:32

solution2
1 2020-03-09 06:54:37