简体   繁体   中英

How do I write a function that converts character vector to date vector in R?

This seems like a simple enough function to write, but I think I'm misunderstanding the requirements for formal arguments / how R parses and evaluates a function.

I'm trying to write a function that converts any character vector of the form "%m/%d/%Y" (and belonging to data.frame df ) to a date vector, and formats it as "%m/%d/%Y" , as follows:

dateformat <- function(x) {
  df$x <- (format(as.Date(df$x, format = "%m/%d/%Y"), "%m/%d/%Y"))
}

I was thinking that...

dateformat(a)

... would just take the "a" as the actual argument for x and plug it into the function, thus resolving as:

 df$a <- (format(as.Date(df$a, format = "%m/%d/%Y"), "%m/%d/%Y"))

However, I get the following error when running dateformat(a) :

Error in as.Date.default(df$x, format = "%m/%d/%Y") : 
  do not know how to convert 'df$x' to class “Date”

Can someone please explain why my understanding of formal/actual arguments and/or R function parsing/evaluation is incorrect? Thank you.

Update

Of course, for all the variables I want to convert to dates (eg, df$a , df$b , df$c ), I could just write

df$a <- (format(as.Date(df$a, format = "%m/%d/%Y"), "%m/%d/%Y"))

df$b <- (format(as.Date(df$b, format = "%m/%d/%Y"), "%m/%d/%Y"))

df$c <- (format(as.Date(df$c, format = "%m/%d/%Y"), "%m/%d/%Y"))

But I'm looking to improve my coding skills by making a more general function to which I could feed a vector of variables. For instance, what if I had df$a to df$z , all character variables that I wanted to convert to date variables? After I write a proper function, I'd like to then perhaps run it like so:

for (n in letters) {
  dateformat(n)
}

First, the format(...) function returns a character vector, not a date, so if x is a string,

format(as.Date(x, format = "%m/%d/%Y"), "%m/%d/%Y")

converts x to date and then back to character, as in:

result <- format(as.Date("01/03/2014", format = "%m/%d/%Y"), "%m/%d/%Y")
result
# [1] "01/03/2014"
class(result)
# [1] "character"

Second, referencing an object, such as df , in a function, on the LHS of an expression, causes R to create that object in the scope of the function.

a <- 2
f <- function(x) a <- x
f(3)
a
# [1] 2

Here, we set a variable, a , to 2 . Then in the function we create a new variable, a in the scope of the function, set it to x (3), and destroy it when the function returns. So in the global environment a is still 2 .

If you insist on using a dateformat(...) function, this should work work:

df <- data.frame(a=paste("01",1:10,"2014",sep="/"),
                 b=paste("02",11:20,"2014",sep="/"),
                 c=paste("03",21:30,"2014",sep="/"))

dateformat <- function(x) as.Date(df[[x]], format = "%m/%d/%Y")
for (n in letters[1:3]) df[[n]] <- dateformat(n)
sapply(df,class)
#      a      b      c 
# "Date" "Date" "Date" 

This will be more efficient though:

df <- as.data.frame(lapply(df,as.Date,format="%m/%d/%Y"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM