简体   繁体   中英

R: Alternative to for-loop possible?

I have a data.frame with two columns indicating the start and the end-date of a certain event, something like this:

      [,1]  [,2]
[1,] 14260 14317
[2,] 13515 13694
[3,] 13696 13878
[4,] 13879 14060
[5,] 14061 14243
[6,] 14244 14426

I'd like to obtain a vector, containing per day (in a period from the minimum until the maximum date in this data.frame) the number of events occurring on that day.

I guess a for-loop would be a logical way to solve this issue: For every two elements in a certain row, I increase the value of a pre-defined vector containing the current count of events per day with one (obviously only taking the days between [,2] and [,1] into account)

However I'd like to find a code that is more efficient to run in R, I tried to mess around with the apply-function for quite some time now but can't seem to find a feasible way to do so..

In the end, I hope to find something like this:

x = [1,1,..., 2,2,2, ..., 2, 1, 1, 1]

with x[1] being the number of events occurring on the first day that is analyzed (day 13515 when considering the example above)

Thanks!

if test is your dataframe, then

create all_days vector with sequence:

all_days <- seq( from = min(test[[1]]), to = max(test[[2]]))

and count the events for each interval:

events_in_days <-
  sapply(all_days, function(x) {
    sum( x >= test[[1]] & x <= test[[2]] )
  })

You have your result in events_in_days .

Maybe you want to check the <= an >= conditions (to decide whether to include the last or(and) the first day in the interval (I included both).

To check the number of days with different number of events call table :

cbind(table(events_in_days))

0    1
1  853
2   58

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM