I have the following problem:
given a stream of observations, find the number of observations that are less or equal to the currently last observation. For example, if the streaming observations are
8, 1, 10, 3, 9, 7, 4, 5, 6, 2
then we have the following updates
As a result one would obtain such values
1, 1, 3, 2, 4, 3, 3, 4, 5, 2
The solution should be very fast as I am working with huge dataset.
Using a for
but in the reverse direction, I don't test but I think it is faster.
xx <- c(8, 1, 10, 3, 9, 7, 4, 5, 6, 2)
res = vector('integer',length=length(xx))
for (i in rev(seq_along(xx))) {
res[i] <- sum(xx[i]>=xx)
xx <- xx[-i]
}
res
[1] 1 1 3 2 4 3 3 4 5 2
You can use sapply
:
vec <- c(8, 1, 10, 3, 9, 7, 4, 5, 6, 2)
sapply(seq_along(vec), function(x) sum(vec[seq(x)] <= vec[x]))
# [1] 1 1 3 2 4 3 3 4 5 2
Since performence is important, you can also use vapply
. It might be faster (untested):
vapply(seq_along(vec), function(x) sum(vec[seq(x)] <= vec[x]), integer(1))
# [1] 1 1 3 2 4 3 3 4 5 2
So I couldn't leave well enough alone, so I created a kludgemonster
carl<-function(vec) {
newct<-vector('integer',length=length(vec))
vlen<-length(vec)
for(j in 1:length(vec) ) {
wins<- (which(vec[j:vlen] >= vec[j])+j-1)
newct[wins]<-newct[wins]+1
}
}
It appears to work, but...
Rgames> set.seed(20)
Rgames> vec<-runif(2000)
Rgames> microbenchmark(carl(vec),agstudy(vec),times=10)
Unit: milliseconds
expr min lq median uq max neval
carl(vec) 86.75314 87.55323 88.16816 88.80831 89.65117 10
agstudy(vec) 70.26213 70.83771 71.06158 71.72247 71.93800 1
Still not quite as good as agstudy's code. Maybe someone can tighten up my loop?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.