简体   繁体   English

R 中的百分位数类别

[英]percentile categories in R

I have a dataset similar to the following and I want to categorise my values in high/medium/low based on percentiles.我有一个类似于以下的数据集,我想根据百分位数将我的值分类为高/中/低。 I use the following but I am confused about the 99% and the values above this value.我使用以下值,但我对 99% 和高于此值的值感到困惑。

data(iris)
quantile(iris$Petal.Length, probs = 0.01)# all the values less than 1.149 are low
quantile(iris$Petal.Length, probs = 0.99)# here must be the high-values category

questions:问题:

  1. there are values greater than the 99% percentile (6.7).存在大于 99% 百分位数 (6.7) 的值。 where these values belong?这些值属于哪里?
  2. what is the medium category?什么是中等类别?
  1. the values greater than those of the 99. percentile are in your top 1%.大于 99. 百分位数的值在您的前 1% 中。 Following your argument, those would be the high values, ie > 6.7根据您的论点,这些将是高值,即> 6.7
  2. the medium category is all what is in your 99. percentile excluding what is in your 1. percentile, ie 1.149 < medium < 6.7中等类别是您 99. 百分位数中的所有内容,不包括您 1. 百分位数中的所有内容,即 1.149 < medium < 6.7

To make this more clear, here is a graph that shows the 5. and the 95. percentile of body hieght in human.为了更清楚地说明这一点,这里有一张图表,显示了人体身高的 5. 和 95. 百分位数。 It was assigned to three categories as in your example.如您的示例所示,它被分配到三个类别。

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM