在带有cut2的pretty_breaks时避免空的和小的组

Question

I'm working with variables resembling the data val values created below: 我与变量类似的数据工作val下面创建值：

# data --------------------------------------------------------------------

data("mtcars")
val <- c(mtcars$wt, 10.55)

I'm cutting this variable in the following manner: 我以以下方式剪切此变量：

# Cuts --------------------------------------------------------------------

cut_breaks <- pretty_breaks(n = 10, eps.correct = 0)(val)
res <- cut2(x = val, cuts = cut_breaks)

which produces the following results: 产生以下结果：

> table(res)
res
[ 1, 2) [ 2, 3) [ 3, 4) [ 4, 5) [ 5, 6)       6       7       8       9 [10,11] 
      4       8      16       1       3       0       0       0       0       1

In the created output I would like to change the following: 在创建的输出中，我想更改以下内容：

I'm not interested in creating grups with one value. 我对创建具有一种价值的团体不感兴趣。 Ideally, I would like to for each group to have at least 3 / 4 values. 理想情况下，我希望每个组至少具有3/4的值。 Paradoxically, I can leave with groups having 0 values as those will dropped later on when mergining on my real data 矛盾的是，我可以离开具有0值的组，因为稍后合并到我的真实数据中时，这些值将掉落
Any changes to the cutting mechanism, have to work on a variable with integer values 对切割机制的任何更改，都必须在具有整数值的变量上进行
The cuts have to be pretty. 削减必须漂亮。 I'm trying to avoid something like 1.23 - 2.35. 我正在尝试避免类似1.23-2.35的情况。 Even if those values would be most sensible considering the distribution. 即使考虑到分布，这些值将是最明智的。
In effect, what I'm trying to achieve is this: try to make more or less even pretty group and if getting a really tiny group then bump it together with the next group, do not worry about empty groups . 实际上，我要实现的目标是： 设法使一个或多个漂亮的小组变得越来越多，如果得到一个非常小的小组，然后将其与下一个小组合并，则不必担心出现空小组 。

Full code 完整代码

For convenience, the full code is available below: 为了方便起见，完整的代码如下：

# Libs --------------------------------------------------------------------

   Vectorize(require)(package = c("scales", "Hmisc"),
                      character.only = TRUE)


   # data --------------------------------------------------------------------

   data("mtcars") val <- c(mtcars$wt, 10.55) 

   # Cuts --------------------------------------------------------------------

   cut_breaks <- pretty_breaks(n = 10, eps.correct = 0)(val) res <-
   cut2(x = val, cuts = cut_breaks)

What I've tried 我尝试过的

First approach 第一种方法

I tried to play with the eps.correct = 0 value in the pretty_breaks like in the code: 我试图用玩eps.correct = 0的值pretty_breaks像代码：

cut_breaks <- pretty_breaks(n = cuts, eps.correct = 0)(variable)

but none of the values gets me anwhere were close 但没有任何一个价值使我接近某个地方

Second approach 第二种方法

I've also tried using the m= 5 argument in the cut2 function but I keep on arriving at the same result. 我也尝试过在cut2函数中使用m= 5参数，但我一直保持相同的结果。

Comment replies 评论回复

My breaks function 我的休息功能

I tried the mybreaks function but I would have to put some work into it to get nice cuts for more bizzare variables. 我尝试了mybreaks函数，但我必须对它进行一些工作才能获得更多精简变量的有效削减。 Broadly speaking, pretty_breaks cuts well for me, juts the tiny groups that occur from time to time are not desired. 从广义上讲， pretty_breaks对我来说很合适，因为不希望出现不时出现的小团体。

> set.seed(1); require(scales)
> mybreaks <- function(x, n, r=0) {
+   unique(round(quantile(x, seq(0, 1, length=n+1)), r))
+ }
> x <- runif(n = 100)
> pretty_breaks(n = 5)(x)
[1] 0.0 0.2 0.4 0.6 0.8 1.0
> mybreaks(x = x, n = 5)
[1] 0 1

Answer 1

You could use the quantile() function as a relatively easy way to get similar numbers of observations in each of your groups. 您可以使用quantile()函数作为相对简单的方法来在每个组中获得相似数量的观测值。

For example, here's a function that takes a vector of values x , a desired number of groups n , and a desired rounding off point r for the breaks, and gives you suggested cut points. 例如，这是一个函数，该函数采用值x的向量，所需的组数n和中断所需的舍入点r ，并为您提供建议的切入点。

mybreaks <- function(x, n, r=0) {
  unique(round(quantile(x, seq(0, 1, length=n+1)), r))
}

cut_breaks  <- mybreaks(val, 5)
res <- cut(val, cut_breaks, include.lowest=TRUE)
table(res)

 [2,3]  (3,4] (4,11] 
     8     16      5

在带有cut2的pretty_breaks时避免空的和小的组

问题描述

Full code 完整代码

What I've tried 我尝试过的

First approach 第一种方法

Second approach 第二种方法

Comment replies 评论回复

My breaks function 我的休息功能

1 个解决方案

解决方案1
1 已采纳 2016-01-04 22:29:25

在带有cut2的pretty_breaks时避免空的和小的组

问题描述

Full code 完整代码

What I've tried 我尝试过的

First approach 第一种方法

Second approach 第二种方法

Comment replies 评论回复

My breaks function 我的休息功能

1 个解决方案

解决方案1 1 已采纳 2016-01-04 22:29:25

解决方案1
1 已采纳 2016-01-04 22:29:25