简体   繁体   English

R - 如何找到多列的前 10%?

[英]R - How to find top 10% of multiple columns?

For school, we are trying to find the top 10% of colleges in temrs of PhD's, Grad.Rate, and Enrollment.对于学校,我们试图找到博士、Grad.Rate 和 Enrollment 排名前 10% 的大学。 We are using the ISLR's College dataframe.我们使用的是 ISLR 的学院 dataframe。

I have tried using head() with order() to order them well, but I am not really sure if all three of these colleges need to be within the top ten percent of each category.我曾尝试使用head()order()来很好地订购它们,但我不确定这三所大学是否都需要在每个类别的前 10% 之内。

The actual question verbatim: 'Create a dataframe that just includes the colleges that are in the top 10% in terms of PhD's, Grad.Rate and Enrollment.'逐字逐句的实际问题:“创建一个 dataframe,其中仅包括博士、毕业率和入学率排名前 10% 的大学。”

Thank you so much.太感谢了。

First, create a vector indicating whether a college is in the top 10 or not for a specific variable:首先,为特定变量创建一个向量,指示一所大学是否在前 10 名中:

College$PhD_top10 <- ifelse(College$PhD >= quantile(College$PhD, probs = 0.9), TRUE, FALSE)

Repeat this for as many variables as you need.根据需要对尽可能多的变量重复此操作。

Then subset the data frame based on those variables:然后根据这些变量对数据框进行子集化:

College[College$PhD_top10, ] # Add & to string along other created variables.

Try using quantile function尝试使用分位数 function

quantile(x, probs = seq(0, 1, by= 0.1)) # decile

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM