简体   繁体   English

R 中数据框每一列的第 25 个分位数

[英]25th quantile for each column of a data frame in R

I am trying to iterate over a data frame in R. For each column I would like to print the 25th quantile.我正在尝试遍历 R 中的数据框。对于每一列,我想打印第 25 个分位数。

Using data from the nycflights13 package I am trying the following:使用来自 nycflights13 包的数据,我正在尝试以下操作:

abt <- select(flights, sched_dep_time)

for(i in names(abt)) {
  qrt_1 <- quantile(abt[i], c(.25))
  print(qrt_1)
}

However this gives me the error: Error: Must use a vector in [ , not an object of class matrix.但是,这给了我错误:错误: Must use a vector in [ 中Must use a vector in , not an object of class matrix.

Where am I taking a wrong turn here?我哪里走错了?

This might not bring you a solution to your question why it does not work, but i want to present you an alternative with lapply()这可能不会为您的问题提供解决方案,为什么它不起作用,但我想向您lapply()的替代方案

lapply(mtcars, function (x) quantile(x, 0.25))

This also returns you the 25% quantile of every column in your dataframe.这还会返回数据框中每列的 25% 分位数。 But every column must be numeric (which you assume in your example).但是每一列都必须是数字(您在示例中假设)。

Also you can use sapply instead of lapply if you want a vectorised output如果您想要矢量化输出,您也可以使用sapply而不是lapply

In your example you are using select to choose a single column from the "flights" data frame, which returns a tibble with a single column giving scheduled departure times.在您的示例中,您使用select从“航班”数据框中选择一列,该列返回一个带有单列的tibble ,给出预定的起飞时间。 You are not iterating over the data frame.您没有迭代数据框。

If you want to iterate over the flights data frame you need to do something like this:如果要遍历航班数据框,则需要执行以下操作:

cat("25th Quantiles:\n===============\n")

for(i in names(flights)) 
{ 
  if(is.numeric(flights[[i]])) 
  {
    qrt_1 <- quantile(flights[[i]], c(.25), na.rm = TRUE)
    cat(i, ":", qrt_1, "\n")
  }
}

Which prints the following to the console:它将以下内容打印到控制台:

#> 25th Quantiles:
#> ===============
#> year : 2013 
#> month : 4 
#> day : 8 
#> dep_time : 907 
#> sched_dep_time : 906 
#> dep_delay : -5 
#> arr_time : 1104 
#> sched_arr_time : 1124 
#> arr_delay : -17 
#> flight : 553 
#> air_time : 82 
#> distance : 502 
#> hour : 9 
#> minute : 8 

You can pipe with dplyr's summarise_if (@emilliman5's comment):可以通过管道与dplyr的summarise_if (@ emilliman5的评论):

library(tidyverse)

flights %>% 
  summarise_if(is.numeric, quantile, 0.25) 

As you dindn't provide any reproducible example, you can check with iris data:由于您没有提供任何可重现的示例,您可以检查iris数据:


using summarise_if使用summarise_if

iris %>% 
   summarise_if(is.numeric, quantile, 0.25)

#  Sepal.Length Sepal.Width Petal.Length Petal.Width
#1          5.1         2.8          1.6         0.3     

or using sapply and select_if (original answer):或使用sapplyselect_if (原始答案):

iris %>% 
  select_if(is.numeric) %>% 
  sapply(quantile, 0.25)

#Sepal.Length.25%  Sepal.Width.25% Petal.Length.25%  Petal.Width.25% 
#             5.1              2.8              1.6              0.3 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R:如何根据数据框中的前几行为第90个分位数创建新列 - R: How to create a new column for 90th quantile based off previous rows in a data frame R:如何根据另一列分组的数据帧中的前几行为第90个分位数创建新列? - R: How to create a new column for 90th quantile based off previous rows in a data frame grouped by another column? 获取每个人的每第n列的总和,并在r中创建新的数据框 - Get sum of every n th column for each individual and create new data frame in r 如何在R中创建一个箱形图,其中框表示第15个和第85个百分位数,而不是默认的第25个和第75个百分位数? - How to create a boxplot in R, with box representing the 15th and 85th percentiles, rather than the default 25th and 75th? 对于 R 数据框中的每一列 - For each column in R data frame R:更改数据框中列的每第5个值 - R: Change every 5th value in a column in a data frame 如何创建 function 以显示表 1 package 中的第 25 和第 75 个百分位数 (IQR) - How to create a function to display the 25th and 75th percentile (IQR) in table1 package 在R中的向量上的数据框中回归每列 - Regress each column in a data frame on a vector in R 将分位数输出到数据帧 - Output of quantile to a data frame R:对于数据帧中的组,将值保持在99分位数以下 - R: Keep values below the 99 quantile for groups in data frame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM