[英]R - How to find top 10% of multiple columns?
For school, we are trying to find the top 10% of colleges in temrs of PhD's, Grad.Rate, and Enrollment.对于学校,我们试图找到博士、Grad.Rate 和 Enrollment 排名前 10% 的大学。 We are using the ISLR's College dataframe.
我们使用的是 ISLR 的学院 dataframe。
I have tried using head()
with order()
to order them well, but I am not really sure if all three of these colleges need to be within the top ten percent of each category.我曾尝试使用
head()
和order()
来很好地订购它们,但我不确定这三所大学是否都需要在每个类别的前 10% 之内。
The actual question verbatim: 'Create a dataframe that just includes the colleges that are in the top 10% in terms of PhD's, Grad.Rate and Enrollment.'逐字逐句的实际问题:“创建一个 dataframe,其中仅包括博士、毕业率和入学率排名前 10% 的大学。”
Thank you so much.太感谢了。
First, create a vector indicating whether a college is in the top 10 or not for a specific variable:首先,为特定变量创建一个向量,指示一所大学是否在前 10 名中:
College$PhD_top10 <- ifelse(College$PhD >= quantile(College$PhD, probs = 0.9), TRUE, FALSE)
Repeat this for as many variables as you need.根据需要对尽可能多的变量重复此操作。
Then subset the data frame based on those variables:然后根据这些变量对数据框进行子集化:
College[College$PhD_top10, ] # Add & to string along other created variables.
Try using quantile function尝试使用分位数 function
quantile(x, probs = seq(0, 1, by= 0.1)) # decile
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.