简体   繁体   中英

How can I convert this dataframe into a multiple time series object in R?

I'm trying to clean up some data ( https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv ) regarding the COVID19 Novel Coronavirus to do various types of analysis (ie. create a chart of countries with 100 cases over time, or track the death-rate over time per country). I used data which had the dates as columns and countries as rows. I transposed the Dataframe so that I got a column for each country and a single column of dates as shown below.

在此处输入图片说明

I have attempted to read this dataframe in as a time series object through the following code:

covid19ts = ts(covid19, frequency = 365, start = c(2020,22))

The result is the following. Instead of getting dates as my index column I get a number from 1 - 47 (the number of days recorded). This results in me being unable to create charts or do any meaningful analysis.

在此处输入图片说明

I have also tried the following code using the lubridate package with the same results:

covid19ts = ts(covid19, frequency = 365, start= decimal_date(as.Date("2020-01-22")))

How can I make my ts dates into the actual dates for charting and analysis?

Or is there a completely different approach I could be using which would be better for the analysis im trying to do?

Thank you for your help.

You could keep the data as a dataframe and do useful plotting. Maybe get the data in long format.

library(tidyverse)
df <- read.csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv', check.names = FALSE)
df1 <- df %>% pivot_longer(cols = -(1:4)) 
head(df1)

# A tibble: 6 x 6
#  `Province/State` `Country/Region`   Lat  Long name    value
#  <fct>            <fct>            <dbl> <dbl> <chr>   <int>
#1 Anhui            Mainland China    31.8  117. 1/22/20     1
#2 Anhui            Mainland China    31.8  117. 1/23/20     9
#3 Anhui            Mainland China    31.8  117. 1/24/20    15
#4 Anhui            Mainland China    31.8  117. 1/25/20    39
#5 Anhui            Mainland China    31.8  117. 1/26/20    60
#6 Anhui            Mainland China    31.8  117. 1/27/20    70

If you want to convert the data into time-series as shown in your post, you could do :

df2 <- df1 %>%
         group_by(`Country/Region`, name) %>%
         summarise(value = sum(value)) %>%
         pivot_wider(names_from = `Country/Region`, values_from = value, 
         values_fill = list(value = 0))

ts_data <- xts::xts(df2[-1], as.Date(df2$name, "%m/%d/%y"))

An alternative solution suggested by @G. Grothendieck relying on zoo is

z <- read.zoo(df1[c(2, 5:6)], index = "name", split = "Country/Region", 
              format = "%m/%d/%Y", aggregate = sum)

read.zoo avoids all the explicit aggregating and reshaping by tidyverse . We can then use autoplot function to plot this zoo object.

Rather than use ts or xts objects, this is best suited to a tsibble format like this.

library(tidyverse)
library(tsibble)
library(feasts)

covid19 <- read_csv("time_series_19-covid-Confirmed.csv") %>%
  pivot_longer(cols = -(1:4)) %>%
  mutate(date = lubridate::mdy(name)) %>%
  select(-name) %>%
  rename(
    "Region" = `Province/State`,
    "Country" = `Country/Region`
  ) %>%
  as_tsibble(key = c(Region, Country), index = date)

# Plot by country
covid19 %>%
  filter(Country %in% c("China", "Italy", "Iran", "South Korea")) %>%
  group_by(Country) %>%
  summarise(value = sum(value)) %>%
  autoplot(value)

Created on 2020-03-09 by the reprex package (v0.3.0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM