简体   繁体   中英

R date sequence - increase not by days but by row/observation

I'm trying to select a date range from a data frame (later also by participant in said data frame). Usually, this is relatively easy IF you want to increase your date range by days for example.

My problem is that I would not like to increase by days, but by rows to see when 100 observations were made. I guess the problem is that I do not have consecutive days in my data frame otherwise I could just do min(as.Date(data$date) + days(100)

I have also tried seq.Date(min(as.Date(data$date), length.out = 100, by = 1)) but that also does not work.

Here is some sample data:

dates <- data.frame(date = c("2015-01-08", "2015-01-05", "2015-01-05", 
"2014-12-22", "2014-11-08", "2014-11-01", "2014-10-24", "2014-10-24", 
"2014-10-18", "2014-09-26", "2014-09-21", "2014-09-19", "2014-08-14", 
"2014-08-08", "2014-08-08", "2014-07-10", "2014-07-10", "2014-06-23", 
"2014-06-20", "2014-06-13", "2014-06-11", "2014-06-07", "2014-06-03", 
"2014-06-02", "2014-05-23", "2014-05-16", "2014-05-02", "2014-04-25",
"2014-04-11", "2014-04-09", "2014-04-01", "2014-03-27", "2014-03-25",
"2014-03-20", "2014-03-14", "2014-03-06", "2014-03-01"))

Now, when I run: seq.Date(min(as.Date(dates$date)), length.out = 20, by = 1) , I do get twenty dates:

[1] "2014-03-01" "2014-03-02" "2014-03-03" "2014-03-04" "2014-03-05" "2014-
03-06" "2014-03-07"
[8] "2014-03-08" "2014-03-09" "2014-03-10" "2014-03-11" "2014-03-12" "2014-
03-13" "2014-03-14"
[15] "2014-03-15" "2014-03-16" "2014-03-17" "2014-03-18" "2014-03-19" "2014-
03-20"

BUT: those are consecutive dates that do not match the dates in the data frame, and so I have no way of telling when 100 observations were made starting from the lowest/oldest date.

Any help would be greatly appreciated! I am sure I can't be the only guy who has run into this issue...could not find anything here though.

You can use the following:

N = 20 # set N to be find difference between 1st and Nth time period
diff(sort(as.Date(dates$date))[c(1,N)])
# Time difference of 114 days

Breaking this down: 1) sort(as.Date(dates$date)) converts character vector to date type, and arranges them in ascending order. 2) [c(1,N)] subsets to find the earliest (1st) date and the Nth one following that. 3) diff() calculates the difference between the two dates.

Thanks to the help of @dww, I was able to construct the following function, which works beautifully (feel free to use):

    time_to_100 <- function(dataframe){

    N = 100 # set number of observations you want to 'check'

    output <- vector("double", length(levels(dataframe$part_id))) 
    # output vector based on number of indiv. part_ids (part_id = factor)

    for(part in dataframe$part_id){
       output[[part]] <-
    as.numeric(diff(sort(as.Date(dataframe[dataframe$part_id == 
    part,]$created))[c(1,N)]), units = "days") # created = the date column
    }

    return(output)
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM