简体   繁体   中英

Updating empirical cumulative function

I have the following problem:

given a stream of observations, find the number of observations that are less or equal to the currently last observation. For example, if the streaming observations are

8, 1, 10, 3, 9, 7, 4, 5, 6, 2

then we have the following updates

  1. Observations - 8, there is 1 observation less or equal 8
  2. Observations - 8, 1, there is 1 observation less or equal 1
  3. Observations - 8, 1, 10, there are 3 observation less or equal 10
  4. ...

As a result one would obtain such values

1, 1, 3, 2, 4, 3, 3, 4, 5, 2

The solution should be very fast as I am working with huge dataset.

Using a for but in the reverse direction, I don't test but I think it is faster.

xx <- c(8, 1, 10, 3, 9, 7, 4, 5, 6, 2)
res = vector('integer',length=length(xx))
for (i in rev(seq_along(xx))) {
  res[i] <- sum(xx[i]>=xx)
  xx <- xx[-i]
}
res
[1] 1 1 3 2 4 3 3 4 5 2

You can use sapply :

vec <- c(8, 1, 10, 3, 9, 7, 4, 5, 6, 2)

sapply(seq_along(vec), function(x) sum(vec[seq(x)] <= vec[x]))
# [1] 1 1 3 2 4 3 3 4 5 2

Since performence is important, you can also use vapply . It might be faster (untested):

vapply(seq_along(vec), function(x) sum(vec[seq(x)] <= vec[x]), integer(1))
# [1] 1 1 3 2 4 3 3 4 5 2

So I couldn't leave well enough alone, so I created a kludgemonster

   carl<-function(vec) {
newct<-vector('integer',length=length(vec))
vlen<-length(vec)
for(j in 1:length(vec) ) {
    wins<- (which(vec[j:vlen] >= vec[j])+j-1)
    newct[wins]<-newct[wins]+1
}
}

It appears to work, but...

Rgames> set.seed(20)
Rgames> vec<-runif(2000)



 Rgames> microbenchmark(carl(vec),agstudy(vec),times=10)
Unit: milliseconds
         expr      min       lq   median       uq      max neval
    carl(vec) 86.75314 87.55323 88.16816 88.80831 89.65117    10
 agstudy(vec) 70.26213 70.83771 71.06158 71.72247 71.93800    1

Still not quite as good as agstudy's code. Maybe someone can tighten up my loop?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM