简体   繁体   English

使用Hmisc中的cut2来计算不同数量的组的切割

[英]Using cut2 from Hmisc to calculate cuts for different number of groups

I was trying to calculate equal quantile cuts for a vector by using cut2 from Hmisc. 我试图通过使用Hmisc的cut2计算向量的等分位数。

library(Hmisc)
c <- c(-4.18304,-3.18343,-2.93237,-2.82836,-2.13478,-2.01892,-1.88773,
       -1.83124,-1.74953,-1.74858,-0.63265,-0.59626,-0.5681)

cut2(c, g=3, onlycuts=TRUE)

[1] -4.18304 -2.01892 -1.74858 -0.56810

But I was expecting the following result (33%, 33%, 33%): 但我期望获得以下结果(33%,33%,33%):

[1] -4.18304 -2.13478 -1.74858 -0.56810

Should I still use cut2 or try something different? 我还是应该使用cut2还是尝试其他方法? How can I make it work? 我该如何运作? Thanks for your advice. 谢谢你的建议。

You are seeing the cutpoints, but you want the tabular counts, and you want them as fractions of the total, so do this instead: 您看到的是切点,但是您想要表格计数,并且希望它们占总数的一部分,因此请执行以下操作:

> prop.table(table(cut2(c, g=3) ) )

[-4.18,-2.019) [-2.02,-1.749) [-1.75,-0.568] 
     0.3846154      0.3076923      0.3076923 

(Obviously you cannot expect cut2 to create an exact split when the count of elements was not evenly divisible by 3.) (显然,当元素的数量不能被3整除时,您不能期望cut2能够创建精确的拆分。)

It seems that there were accidentally thirteen values in the original data set, instead of twelve. 原始数据集中似乎偶然有13个值,而不是12个。 Thirteen values cannot be equally divided into three quantile groups (as mentioned by BondedDust). 不能将13个值平均分为三个分位数组(如BondedDust所述)。 Here is the original problem, except that one selected data value (-1.74953) is excluded, making it twelve values. 这是原始的问题,只是排除了一个选定的数据值(-1.74953),使其变为十二个值。 This gives the result originally expected: 这给出了最初预期的结果:

library(Hmisc)

c<-c(-4.18304,-3.18343,-2.93237,-2.82836,-2.13478,-2.01892,-1.88773,-1.83124,-1.74858,-0.63265,-0.59626,-0.5681)

cut2(c, g=3,onlycuts=TRUE)
#[1] -4.18304 -2.13478 -1.74953 -0.5681


To make it clearer to anyone not familiar with cut2 from the Hmisc package (like me as of this morning), here's a similar problem, except that we'll use the integers 1 through 12 (assigned to the vector dozen_values ). 为了使对不熟悉Hmisc包中的 cut2任何人(今天上午像我一样)更清楚,这是一个类似的问题,除了我们将使用1到12之间的整数(分配给向量dozen_values )。

library(Hmisc)

dozen_values <-1:12

quantile_groups <- cut2(dozen_values,g=3)

levels(quantile_groups)
## [1] "[1, 5)" "[5, 9)" "[9,12]"

cutpoints <- cut2(dozen_values, g=3, onlycuts=TRUE)

cutpoints
## [1]  1  5  9 12

# Show which values belong to which quantile group, using a data frame
quantile_DF <- data.frame(dozen_values, quantile_groups)
names(quantile_DF) <- c("value", "quantile_group")

quantile_DF
##    value quantile_group
## 1      1         [1, 5)
## 2      2         [1, 5)
## 3      3         [1, 5)
## 4      4         [1, 5)
## 5      5         [5, 9)
## 6      6         [5, 9)
## 7      7         [5, 9)
## 8      8         [5, 9)
## 9      9         [9,12]
## 10    10         [9,12]
## 11    11         [9,12]
## 12    12         [9,12]

Notice that, the first quantile group includes everything up to, but not including , 5 (ie 1 thorough 4, in this case). 请注意,第一个分位数组包括但不包括 5的所有内容(在这种情况下为1到4)。 The second quantile group contains 5 up to, but not including , 9 (ie 5 through 8, in this case). 第二个分位数组包含5个, 但不包括 9个(在这种情况下,即5到8)。 The third (last) quantile group contains 9 through 12, which includes the last value 12. Unlike the other quantile groups, the third quantile group includes the last value shown. 第三(最后)分位数组包含9到12,其中包括最后一个值12。与其他分位数组不同,第三分位数组包括所示的最后一个值。

Anyway, you can see that the "cutpoints" 1 , 5 , 9 , and 12 describe the start and end points of the quantile groups in the most concise way, but it is obtuse without reading relevant documentation (link to single page Inside-R site, instead of the almost 400 page PDF manual). 无论如何,你可以看到“切点” 159 ,和12描述了最简洁的方式位数组的起点和终点,却是钝没有阅读相关文档 (链接到单一页面内-R网站,而不是将近400页的PDF手册)。

See this explanation about the parentheses vs square bracket notation, if it is unfamiliar to you. 如果您不熟悉,请参阅有关括号与方括号符号的说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM