简体   繁体   English

如何在R中将相同大小的连续变量分为4组?

[英]How to categorize a continuous variable in 4 groups of the same size in R?

I need to categorize a continuous variable in 4 classes each one with the same number of observations. 我需要将连续变量分为4个类,每个类具有相同数量的观察值。 I have used the function 我已经使用了功能

cut(x, breaks = quantile(x,probs=seq(0,1,0.25)),include.lowest=TRUE,right=FALSE))

My problem is that the number of observations in each category is not exactly the same because there are observations (and more than one) which have exactly the same value of the quantiles. 我的问题是,每个类别中观察值的数量并不完全相同,因为有一些观察值(不止一个)具有完全相同的分位数值。 How can I do it? 我该怎么做?

My variable is waiting 我的变量正在等待

[1] 79 54 74 62 85 55 88 85 51 85 54 84 78 47 83 52 62 84 52 79 51 47 78 69 74
[26] 83 55 76 78 79 73 77 66 80 74 52 48 80 59 90 80 58 84 58 73 83 64 53 82 59
[51] 75 90 54 80 54 83 71 64 77 81 59 84 48 82 60 92 78 78 65 73 82 56 79 71 62
[76] 76 60 78 76 83 75 82 70 65 73 88 76 80 48 86 60 90 50 78 63 72 84 75 51 82
[101] 62 88 49 83 81 47 84 52 86 81 75 59 89 79 59 81 50 85 59 87 53 69 77 56 88
[126] 81 45 82 55 90 45 83 56 89 46 82 51 86 53 79 81 60 82 77 76 59 80 49 96 53
[151] 77 77 65 81 71 70 81 93 53 89 45 86 58 78 66 76 63 88 52 93 49 57 77 68 81
[176] 81 73 50 85 74 55 77 83 83 51 78 84 46 83 55 81 57 76 84 77 81 87 77 51 78
[201] 60 82 91 53 78 46 77 84 49 83 71 80 49 75 64 76 53 94 55 76 50 82 54 75 78
[226] 79 78 78 70 79 70 54 86 50 90 54 54 77 79 64 75 47 86 63 85 82 57 82 67 74
[251] 54 83 73 73 88 80 71 83 56 79 78 84 58 83 43 60 75 81 46 90 46 74

which is in the dataset faithful in R. It has 272 observations, therefore it is divisible by 4 giving 68 observations in each category. 在数据集中忠实于R。它具有272个观察值,因此可以被4除以给出每个类别中的68个观察值。

I have used 我用过

newwait<-cut(waiting, breaks =quantile(waiting,probs=seq(0,1,0.25)),include.lowest=TRUE,right=FALSE)

table(newwait)
newwait
[43,58) [58,76) [76,82) [82,96] 
     66      68      67      71 

as you can see, the number of observations in each group is similar but not exactly the same. 如您所见,每个组中的观察次数相似但不完全相同。

Basically, it sounds like you need to deal with ties. 基本上,这听起来像您需要处理领带。 You also need to have a vector whose length, when divided by 4, yields an integer...but I'll assume you know that. 您还需要一个向量,将其长度除以4得出一个整数...但是我假设您知道这一点。

Here's a solution using the tie-breaking functions of rank : 这是使用rank的平局决胜功能的解决方案:

set.seed(1)
x <- round(runif(1000,0,1),1)
table(x)
## x
##   0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9   1 
##  43 106  95 103 112 109  82 102  95 100  53

y <- rank(x, ties.method='first') # <- this forces tie breaks
cuts <- cut(y, breaks = quantile(y,probs=seq(0,1,0.25)),
               include.lowest=TRUE,
               right=FALSE)
# check that cuts are all the same length:
lapply(split(x,cuts), length)
$`[1,251)`
[1] 250

$`[251,500)`
[1] 250

$`[500,750)`
[1] 250

$`[750,1e+03]`
[1] 250

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何获取R中按组分层的连续变量的描述表LateX - How to obtain a descriptive LateX table of continuous variable stratified by groups in R 根据R中特定年份的变量值对面板中的组进行分类 - Categorize groups in panel according to value of a variable in a specific year in R 将连续变量分成大小相等的组 - Split continuous variable into equal size groups 如何聚合数据集并计算 R 中跨组的连续变量的熵? - How can I aggregate a data set and calculate entropy of a continuous variable across groups in R? R:根据连续变量确定最大限度地分离两个组的阈值? - R: Determine the threshold that maximally separates two groups based on a continuous variable? 如何将 1 列中的数据分类为 R 中的新变量? - How to categorize data from 1 column into a new variable in R? echarts4r:如何将第三个连续变量 map 转换为自定义色标(不改变点大小) - echarts4r : how to map a third continuous variable to a custom color scale (without changing the points size) 对连续的预测变量进行分类并计算二进制结果的比例 - Categorize a continuous predictor variable and calculate proportion of binary outcome 如何对 r 中的数值范围进行分类 - How to categorize numerical ranges in r 如何在R中用连续的相同字母分割字符串 - How to split a string by continuous same letter in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM