简体   繁体   中英

R - faster alternative to hist(XX, plot=FALSE)$count

I am on the lookout for a faster alternative to R's hist(x, breaks=XXX, plot=FALSE)$count function as I don't need any of the other output that is produced (as I want to use it in an sapply call, requiring 1 million iterations in which this function would be called), eg

x = runif(100000000, 2.5, 2.6)
bincounts = hist(x, breaks=seq(0,3,length.out=100), plot=FALSE)$count

Any thoughts?

A first attempt using table and cut :

table(cut(x, breaks=seq(0,3,length.out=100)))

It avoids the extra output, but takes about 34 seconds on my computer:

system.time(table(cut(x, breaks=seq(0,3,length.out=100))))
   user  system elapsed 
 34.148   0.532  34.696 

compared to 3.5 seconds for hist :

system.time(hist(x, breaks=seq(0,3,length.out=100), plot=FALSE)$count)
   user  system elapsed 
  3.448   0.156   3.605

Using tabulate and .bincode runs a little bit faster than hist :

tabulate(.bincode(x, breaks=seq(0,3,length.out=100)), nbins=100)

system.time(tabulate(.bincode(x, breaks=seq(0,3,length.out=100))), nbins=100)
   user  system elapsed 
  3.084   0.024   3.107

Using tablulate and findInterval provides a significant performance boost relative to table and cut and has an OK improvement relative to hist :

tabulate(findInterval(x, vec=seq(0,3,length.out=100)), nbins=100)

system.time(tabulate(findInterval(x, vec=seq(0,3,length.out=100))), nbins=100)
   user  system elapsed 
  2.044   0.012   2.055

Seems your best bet is to just cut out all the overhead of hist.default .

nB1 <- 99
delt <- 3/nB1
fuzz <- 1e-7 * c(-delt, rep.int(delt, nB1))
breaks <- seq(0, 3, by = delt) + fuzz

.Call(graphics:::C_BinCount, x, breaks, TRUE, TRUE)

I pared down to this by running debugonce(hist.default) to get a feel for exactly how hist works (and testing with a smaller vector -- n = 100 instead of 1000000 ).

Comparing:

x = runif(100, 2.5, 2.6)
y1 <- .Call(graphics:::C_BinCount, x, breaks + fuzz, TRUE, TRUE)
y2 <- hist(x, breaks=seq(0,3,length.out=100), plot=FALSE)$count
identical(y1, y2)
# [1] TRUE

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM