简体   繁体   中英

Comparing two date vectors with function in R to avoid loop and dealing with NA

There is probably a very trivial workaround to this, but here goes... I am trying to compare two date vectors in R (not originally input as date vectors) to: return the first value if the second is NA and the first is not missing; to return the largest of the two dates if the second is not missing; or to return NA if both values are missing. For example, for data presented below, I'd like lastdate to compute as follows:

v1        v2         lastdate
1/2/2006  NA         1/2/2006
1/2/2006  12/2/2006  12/2/2006
NA        NA         NA

I have written a formula to avoid looping over each row (85K in these data) as follows:

lastdate <- function(lastdate1,lastdate2){
    if (is.na(lastdate1)==T & is.na(lastdate2)==T) {NA}
    else if (is.na(lastdate2)==T & !is.na(lastdate1)) {as.Date(lastdate1,format="%m/%d/%Y")}
    else {max(as.Date(lastdate2,format="%m/%d/%Y"),as.Date(lastdate1,format="%m/%d/%Y"))}
}
dfbobs$leaveobsdate <- lastdate(as.Date(dfbobs$leavedate1,format="%m/%d/%Y"),as.Date(dfbobs$leavedate2,format="%m/%d/%Y"))

The last line is telling it to compare two vectors of dates, but is not quite right as I am getting the errors

Warning messages:
1: In if (is.na(lastdate1) == T & is.na(lastdate2) == T) { :
  the condition has length > 1 and only the first element will be used
2: In if (is.na(lastdate2) == T & !is.na(lastdate1)) { :
  the condition has length > 1 and only the first element will be used

I'm sure this is very silly and there's probably a much easier way to do this, but any help would be appreciated.

EDIT: I have now attempted this with an ifelse function to deal with the vectors, as suggested, but the comparison, while working if I type in single values (eg, lastdate("1/1/2006","1/2/2006")), produces NAs if I try it on the dataframe vectors. The code follows:

lastdate <- function(lastdate1,lastdate2){
ifelse(is.na(lastdate1==T) & is.na(lastdate2==T), NA, 
    ifelse(is.na(lastdate2)==T & !is.na(lastdate1), as.Date(lastdate1,format="%m/%d/%Y"), 
        ifelse(!is.na(lastdate2) & !is.na(lastdate1), max(as.Date(lastdate2,format="%m/%d/%Y"),as.Date(lastdate1,format="%m/%d/%Y")),NA)))
}
dfbobs$leaveobsdate <- as.Date(lastdate(as.Date(dfbobs$leavedate1,format="%m/%d/%Y"),as.Date(dfbobs$leavedate2,format="%m/%d/%Y")),origin="1970-01-01")

if is not vectorized - it expects a single argument. Use ifelse .

Alternatively, you can use mapply with your existing function:

mapply(lastdate, as.Date(df$leavedate1, ...), as.Date(df$v2, ...))

try this:

convert dates to numeric form like so

v1<-as.character(v1); v2<-as.character(v2);
v1<-as.numeric(strftime(strptime(v1,"%m/%d/%Y"),"%Y%m%d"));
v2<-as.numeric(strftime(strptime(v2,"%m/%d/%Y"),"%Y%m%d"));

compute result now

result<-ifelse(!is.na(v1) | !is.na(v2),max(v1,v2,na.rm=TRUE),NA);

cast back to format of your choice

result<-strptime(result,"%Y%m%d");
result<-strftime(result,"%m/%d/%Y");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM