[英]Avoiding empty and small groups when using pretty_breaks with cut2
I'm working with variables resembling the data val
values created below: 我与变量类似的数据工作
val
下面创建值:
# data --------------------------------------------------------------------
data("mtcars")
val <- c(mtcars$wt, 10.55)
I'm cutting this variable in the following manner: 我以以下方式剪切此变量:
# Cuts --------------------------------------------------------------------
cut_breaks <- pretty_breaks(n = 10, eps.correct = 0)(val)
res <- cut2(x = val, cuts = cut_breaks)
which produces the following results: 产生以下结果:
> table(res)
res
[ 1, 2) [ 2, 3) [ 3, 4) [ 4, 5) [ 5, 6) 6 7 8 9 [10,11]
4 8 16 1 3 0 0 0 0 1
In the created output I would like to change the following: 在创建的输出中,我想更改以下内容:
For convenience, the full code is available below: 为了方便起见,完整的代码如下:
# Libs --------------------------------------------------------------------
Vectorize(require)(package = c("scales", "Hmisc"),
character.only = TRUE)
# data --------------------------------------------------------------------
data("mtcars") val <- c(mtcars$wt, 10.55)
# Cuts --------------------------------------------------------------------
cut_breaks <- pretty_breaks(n = 10, eps.correct = 0)(val) res <-
cut2(x = val, cuts = cut_breaks)
I tried to play with the eps.correct = 0
value in the pretty_breaks
like in the code: 我试图用玩
eps.correct = 0
的值pretty_breaks
像代码:
cut_breaks <- pretty_breaks(n = cuts, eps.correct = 0)(variable)
but none of the values gets me anwhere were close 但没有任何一个价值使我接近某个地方
I've also tried using the m= 5
argument in the cut2
function but I keep on arriving at the same result. 我也尝试过在
cut2
函数中使用m= 5
参数,但我一直保持相同的结果。
I tried the mybreaks
function but I would have to put some work into it to get nice cuts for more bizzare variables. 我尝试了
mybreaks
函数,但我必须对它进行一些工作才能获得更多精简变量的有效削减。 Broadly speaking, pretty_breaks
cuts well for me, juts the tiny groups that occur from time to time are not desired. 从广义上讲,
pretty_breaks
对我来说很合适,因为不希望出现不时出现的小团体。
> set.seed(1); require(scales)
> mybreaks <- function(x, n, r=0) {
+ unique(round(quantile(x, seq(0, 1, length=n+1)), r))
+ }
> x <- runif(n = 100)
> pretty_breaks(n = 5)(x)
[1] 0.0 0.2 0.4 0.6 0.8 1.0
> mybreaks(x = x, n = 5)
[1] 0 1
You could use the quantile()
function as a relatively easy way to get similar numbers of observations in each of your groups. 您可以使用
quantile()
函数作为相对简单的方法来在每个组中获得相似数量的观测值。
For example, here's a function that takes a vector of values x
, a desired number of groups n
, and a desired rounding off point r
for the breaks, and gives you suggested cut points. 例如,这是一个函数,该函数采用值
x
的向量,所需的组数n
和中断所需的舍入点r
,并为您提供建议的切入点。
mybreaks <- function(x, n, r=0) {
unique(round(quantile(x, seq(0, 1, length=n+1)), r))
}
cut_breaks <- mybreaks(val, 5)
res <- cut(val, cut_breaks, include.lowest=TRUE)
table(res)
[2,3] (3,4] (4,11]
8 16 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.