简体   繁体   中英

R - Date format conversion using string methods

I currently have a data frame with 15 variables and approximately 3 million rows.

One of the columns is a date column, formatted as yyyymmdd and my goal is to reformat that string as yyyymm01 if dd is >=1 and <=14 and yyyymm02 otherwise.

When I run my code I get

Error in 1:end : NA/NaN argument

and I'm not quite sure why. My code is below.

for(i in 1:end)
{
technical.montday[i] = substr(toString(technical$datadate[i]), start = 1, stop = 6)
technical$datadate[i] =  ifelse((as.integer(substr(toString(technical$datadate[i]),start =     7, stop = 8)) >= 1) && (as.integer(substr(toString(technical$datadate[i]),start = 7, stop =  8))<=14),paste(technical.montday,"01", sep=""), paste(technical.montday,"15", sep="") )
}

One of the columns is a date column, formatted as yyyymmdd and my goal is to reformat that string as yyyymm01 if dd is >=1 and <=14 and yyyymm02 otherwise.

I don't understand your code but what you say could be done eg like this:

# suppose DATE is the date column
dd <- as.integer(substr(DATE, 7,8))
DATE <- paste0(substr(DATE, 1, 6), ifelse(dd<=14 & dd>=1, "01", "02")

The ifelse part could probably shortened to ifelse(dd<=14, "01", "02") . If you need DATE to be numeric, then add as.numeric or as.integer .

(edit)

It is probably more efficient to use substring replacement:

DATE <- as.character(DATE)
substr(DATE, 7,8) <- ifelse(substr(DATE, 7,8) > 14, "02", "01")

(Note that substr(DATE,7,8) is implicitly converted to numeric.) It works:

> DATE <- as.character(20140401:20140430)
> substr(DATE, 7,8) <- ifelse(substr(DATE, 7,8) > 14, "02", "01")
> DATE
 [1] "20140401" "20140401" "20140401" "20140401" "20140401" "20140401"
 [7] "20140401" "20140401" "20140401" "20140401" "20140401" "20140401"
[13] "20140401" "20140401" "20140402" "20140402" "20140402" "20140402"
[19] "20140402" "20140402" "20140402" "20140402" "20140402" "20140402"
[25] "20140402" "20140402" "20140402" "20140402" "20140402" "20140402"

Perhaps take a different approach:

technical <- data.frame(datadate = c("20140101", "20140203", "20131216", "20131130"), 
    stringsAsFactors = FALSE)

print(technical$datadate)
## [1] "20140101" "20140203" "20131216" "20131130"

technical$datadate <- sapply(technical$datadate, function(x) {

    year.mon <- substr(x, 1, 6)
    dd <- as.numeric(substr(x, 7, 8))

    return(paste(year.mon, ifelse((dd > 14), "02", "01"), sep = "", collapse = ""))

})

print(technical$datadate)
## [1] "20140101" "20140201" "20131202" "20131102"

NOTE: paste0 might be faster and that might be significant for your situation. I also went for a sapply for just such a reason.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM