简体   繁体   中英

Importing dates with readr::read_csv()

I want to import a CSV file

today,color
01/02,blue
01/04,green
03/14,orange
07/04,red

using readr to create a column of date objects.

library(tidyverse)
library(lubridate)

read_csv("test.csv", col_types = "Dc") #first attempt
read_csv("test.csv", col_types = cols( #second attempt
         col_date(format = "%m/%d"),
         col_character()))

I figured that my first attempt didn't work because of the non-standard date format, so in my second attempt, I was explicit. Neither succeeded, and both returned the same warning.

Warning: 4 parsing failures.
row   col   expected actual       file
  1 today valid date  01/02 'test.csv'
  2 today valid date  01/04 'test.csv'
  3 today valid date  03/14 'test.csv'
  4 today valid date  07/04 'test.csv'
# A tibble: 4 x 2
  today      color
  <date>     <chr>
1 NA         blue
2 NA         green
3 NA         orange
4 NA         red

How should I structure this import?

It is not a date format, thus col_date wouldn't work ie we need 'year' as well to have it. Instead, it is better to read it as character

df1 <- read_csv("test.csv", col_types = "cc") 

Then, add the year part as need, convert to Date class

library(lubridate)
df1$today <- dmy(paste0(df1$today, "/2021"))

The real problem here is that what we have is not a Date. A Date has a year and the input in the question has no year.

1) To overcome the above problem we can define a special class that can accept a month and day without year in the required format. We assume that the year should default to the current year. Use it with read.csv since it can work with arbitrary S4 classes.

Lines is defined in the Note at the end. Replace text=Lines with the filename to read from a file.

setClass("mmdd")
ch2mmdd <- function(from) as.Date(from, format = "%m/%d")
setAs("character", "mmdd", ch2mmdd)

read.csv(text = Lines, colClasses = c("mmdd", "character"))

giving:

       today  color
1 2021-01-02   blue
2 2021-01-04  green
3 2021-03-14 orange
4 2021-07-04    red

2) Alternately, use read_csv and convert it afterwards. This uses the ch2mmdd function from (1) but does not need the associated S4 class. On the other hand it does the conversion afterwards whereas it seems that the question wanted to do it as it was read in as in (1).

Lines %>%
  read_csv %>%
  mutate(today = ch2mmdd(today))

Note

Lines <- "today,color
01/02,blue
01/04,green
03/14,orange
07/04,red"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM