[英]Using cut as part of a function in R to calculate quintiles
I've been asked to use "cut" in R to create quartiles for my variable wt71 in database nhefs.我被要求在 R 中使用“剪切”为数据库 nhefs 中的变量 wt71 创建四分位数。 Here is my code:
这是我的代码:
apply_quintiles <-function(x) {
cut(x, breaks =c(quantile(nhefs$wt71,probs=seq(0,1, by=0.25))), labels=c(25, 50, 75, 100),include.lowest=TRUE)
}
nhefs$quintiles<-sapply(nhefs$wt71,apply_quintiles)
head(mean_weights)
table(nhefs$quintiles)
Here is my output:这是我的输出:
This is very far from what I was expecting:这与我的预期相去甚远:
Does anyone know what is going on here?有谁知道这里发生了什么?
The table
created shows the number (N) of rows that fall within that quartile.创建的
table
显示落在该四分位数内的行数 (N)。 That is different than the wt71
values computed by summary
indicating threshold for 1st or 3rd quartile or median.这不同于通过
summary
计算的wt71
值,指示第一或第三四分位数或中位数的阈值。 (Note: as @Gregor pointed out, these are quartiles not quintiles.) (注意:正如@Gregor 指出的那样,这些是四分位数而不是五分位数。)
To illustrate, I changed the labels to clarify the quartiles produced:为了说明,我更改了标签以阐明生成的四分位数:
set.seed(1)
nhefs <- data.frame(
wt71 = round(runif(100, min=1, max=100), 0)
)
apply_quintiles <-function(x) {
cut(x, breaks =c(quantile(nhefs$wt71,probs=seq(0,1, by=0.25))), labels=c("0-25", "25-50", "50-75", "75-100"),include.lowest=TRUE)
}
nhefs$quintiles<-sapply(nhefs$wt71,apply_quintiles)
table(nhefs$quintiles)
0-25 25-50 50-75 75-100
25 25 26 24
This demonstrates equal distribution of the 100 random numbers across the 4 quartiles.这表明 100 个随机数在 4 个四分位数中均匀分布。 There are N=25 between 0-25%ile and N=26 at 50-75%ile, etc. These numbers are not values of
wt71
but instead of the number of data elements or rows that fall in that range of percentiles.在 0-25%ile 之间有 N=25,在 50-75%ile 之间有 N=26,等等。这些数字不是
wt71
的值,而是落在该百分位数范围内的数据元素或行的数量。
Here's the summary
of wt71
:这是
wt71
的summary
:
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.00 32.75 49.50 52.24 77.00 99.00
These values correspond to thresholds for 1st quartile, median, and 3rd quartile.这些值对应于第一个四分位数、中位数和第三个四分位数的阈值。 These threshold values do relate to value of
wt71
.这些阈值确实与
wt71
的值有关。 For example, a wt71
value of 30 would be less than 1st quartile level.例如,30 的
wt71
值将小于第一个四分位水平。
Taking a look at nhefs
now:现在看看
nhefs
:
head(nhefs)
wt71 quintiles
1 27 0-25
2 38 25-50
3 58 50-75
4 91 75-100
5 21 0-25
6 90 75-100
Notice that for your different wt71
values, they are assigned to different quartiles.请注意,对于不同的
wt71
值,它们被分配到不同的四分位数。 The wt71
of 27 is in the lowest quartile (0-25) as this value is less than the threshold for 1st quartile of 32.75. 27 的
wt71
位于最低四分位数 (0-25),因为该值小于第一四分位数的阈值 32.75。
Hope this helps!希望这可以帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.