简体   繁体   English

在 R 中使用 cut 作为函数的一部分来计算五分位数

[英]Using cut as part of a function in R to calculate quintiles

I've been asked to use "cut" in R to create quartiles for my variable wt71 in database nhefs.我被要求在 R 中使用“剪切”为数据库 nhefs 中的变量 wt71 创建四分位数。 Here is my code:这是我的代码:

 apply_quintiles <-function(x) {
cut(x, breaks =c(quantile(nhefs$wt71,probs=seq(0,1, by=0.25))), labels=c(25, 50, 75, 100),include.lowest=TRUE)
}
nhefs$quintiles<-sapply(nhefs$wt71,apply_quintiles)
head(mean_weights)
table(nhefs$quintiles)

Here is my output:这是我的输出:
在此处输入图像描述

This is very far from what I was expecting:这与我的预期相去甚远:

在此处输入图像描述
Does anyone know what is going on here?有谁知道这里发生了什么?

The table created shows the number (N) of rows that fall within that quartile.创建的table显示落在该四分位数内的行数 (N)。 That is different than the wt71 values computed by summary indicating threshold for 1st or 3rd quartile or median.这不同于通过summary计算的wt71值,指示第一或第三四分位数或中位数的阈值。 (Note: as @Gregor pointed out, these are quartiles not quintiles.) (注意:正如@Gregor 指出的那样,这些是四分位数而不是五分位数。)

To illustrate, I changed the labels to clarify the quartiles produced:为了说明,我更改了标签以阐明生成的四分位数:

set.seed(1)

nhefs <- data.frame(
  wt71 =  round(runif(100, min=1, max=100), 0)
)

apply_quintiles <-function(x) {
  cut(x, breaks =c(quantile(nhefs$wt71,probs=seq(0,1, by=0.25))), labels=c("0-25", "25-50", "50-75", "75-100"),include.lowest=TRUE)
}

nhefs$quintiles<-sapply(nhefs$wt71,apply_quintiles)

table(nhefs$quintiles)

  0-25  25-50  50-75 75-100 
    25     25     26     24 

This demonstrates equal distribution of the 100 random numbers across the 4 quartiles.这表明 100 个随机数在 4 个四分位数中均匀分布。 There are N=25 between 0-25%ile and N=26 at 50-75%ile, etc. These numbers are not values of wt71 but instead of the number of data elements or rows that fall in that range of percentiles.在 0-25%ile 之间有 N=25,在 50-75%ile 之间有 N=26,等等。这些数字不是wt71的值,而是落在该百分位数范围内的数据元素或行的数量。

Here's the summary of wt71 :这是wt71summary

Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
2.00   32.75   49.50   52.24   77.00   99.00 

These values correspond to thresholds for 1st quartile, median, and 3rd quartile.这些值对应于第一个四分位数、中位数和第三个四分位数的阈值。 These threshold values do relate to value of wt71 .这些阈值确实与wt71的值有关。 For example, a wt71 value of 30 would be less than 1st quartile level.例如,30 的wt71值将小于第一个四分位水平。

Taking a look at nhefs now:现在看看nhefs

head(nhefs)

  wt71 quintiles
1   27      0-25
2   38     25-50
3   58     50-75
4   91    75-100
5   21      0-25
6   90    75-100

Notice that for your different wt71 values, they are assigned to different quartiles.请注意,对于不同的wt71值,它们被分配到不同的四分位数。 The wt71 of 27 is in the lowest quartile (0-25) as this value is less than the threshold for 1st quartile of 32.75. 27 的wt71位于最低四分位数 (0-25),因为该值小于第一四分位数的阈值 32.75。

Hope this helps!希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM