简体   繁体   中英

divide a range of values in bins of equal length: cut vs cut2

I'm using the cut function to split my data in equal bins, it does the job but I'm not happy with the way it returns the values. What I need is the center of the bin not the upper and lower ends.
I've also tried to use cut2{Hmisc} , this gives me the center of each bins, but it divides the range of data in bins that contains the same numbers of observations, rather than being of the same length.

Does anyone have a solution to this?

It's not too hard to make the breaks and labels yourself, with something like this. Here since the midpoint is a single number, I don't actually return a factor with labels but instead a numeric vector.

cut2 <- function(x, breaks) {
  r <- range(x)
  b <- seq(r[1], r[2], length=2*breaks+1)
  brk <- b[0:breaks*2+1]
  mid <- b[1:breaks*2]
  brk[1] <- brk[1]-0.01
  k <- cut(x, breaks=brk, labels=FALSE)
  mid[k]
}

There's probably a better way to get the bin breaks and midpoints; I didn't think about it very hard.

Note that this answer is different than Joshua's; his gives the median of the data in each bins while this gives the center of each bin.

> head(cut2(x,3))
[1] 16.666667  3.333333 16.666667  3.333333 16.666667 16.666667
> head(ave(x, cut(x,3), FUN=median))
[1] 18  2 18  2 18 18

Use ave like so:

set.seed(21)
x <- sample(0:20, 100, replace=TRUE)
xCenter <- ave(x, cut(x,3), FUN=median)

We can use smart_cut from package cutr :

devtools::install_github("moodymudskipper/cutr")
library(cutr)

Using @Joshua's sample data:

median by interval (same output as @Joshua except it's an ordered factor) :

smart_cut(x,3, "n_intervals", labels= ~ median(.))
# [1] 18 2  18 2  18 18 ...
# Levels: 2 < 11 < 18

center of each interval (same output as @Aaron except it's an ordered factor) :

smart_cut(x,3, "n_intervals", labels= ~ mean(.y))
# [1] 16.67 3.333 16.67 3.333 16.67 16.67 ...
# Levels: 3.333 < 10 < 16.67

mean of values by interval :

smart_cut(x,3, "n_intervals", labels= ~ mean(.))
# [1] 17.48 2.571 17.48 2.571 17.48 17.48 ...
# Levels: 2.571 < 11.06 < 17.48

labels can be a character vector just like in base::cut.default , but it can also be, as it is here, a function of 2 parameters, the first being the values contained in the bin, and the second the cut points of the bin.

more on cutr and smart_cut

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM