[英]25th quantile for each column of a data frame in R
I am trying to iterate over a data frame in R. For each column I would like to print the 25th quantile.我正在尝试遍历 R 中的数据框。对于每一列,我想打印第 25 个分位数。
Using data from the nycflights13 package I am trying the following:使用来自 nycflights13 包的数据,我正在尝试以下操作:
abt <- select(flights, sched_dep_time)
for(i in names(abt)) {
qrt_1 <- quantile(abt[i], c(.25))
print(qrt_1)
}
However this gives me the error: Error: Must use a vector in
[ , not an object of class matrix.
但是,这给了我错误:错误:
Must use a vector in
[ 中Must use a vector in
, not an object of class matrix.
Where am I taking a wrong turn here?我哪里走错了?
This might not bring you a solution to your question why it does not work, but i want to present you an alternative with lapply()
这可能不会为您的问题提供解决方案,为什么它不起作用,但我想向您
lapply()
的替代方案
lapply(mtcars, function (x) quantile(x, 0.25))
This also returns you the 25% quantile of every column in your dataframe.这还会返回数据框中每列的 25% 分位数。 But every column must be numeric (which you assume in your example).
但是每一列都必须是数字(您在示例中假设)。
Also you can use sapply
instead of lapply
if you want a vectorised output如果您想要矢量化输出,您也可以使用
sapply
而不是lapply
In your example you are using select
to choose a single column from the "flights" data frame, which returns a tibble
with a single column giving scheduled departure times.在您的示例中,您使用
select
从“航班”数据框中选择一列,该列返回一个带有单列的tibble
,给出预定的起飞时间。 You are not iterating over the data frame.您没有迭代数据框。
If you want to iterate over the flights data frame you need to do something like this:如果要遍历航班数据框,则需要执行以下操作:
cat("25th Quantiles:\n===============\n")
for(i in names(flights))
{
if(is.numeric(flights[[i]]))
{
qrt_1 <- quantile(flights[[i]], c(.25), na.rm = TRUE)
cat(i, ":", qrt_1, "\n")
}
}
Which prints the following to the console:它将以下内容打印到控制台:
#> 25th Quantiles:
#> ===============
#> year : 2013
#> month : 4
#> day : 8
#> dep_time : 907
#> sched_dep_time : 906
#> dep_delay : -5
#> arr_time : 1104
#> sched_arr_time : 1124
#> arr_delay : -17
#> flight : 553
#> air_time : 82
#> distance : 502
#> hour : 9
#> minute : 8
You can pipe with dplyr's summarise_if
(@emilliman5's comment):可以通过管道与dplyr的
summarise_if
(@ emilliman5的评论):
library(tidyverse)
flights %>%
summarise_if(is.numeric, quantile, 0.25)
As you dindn't provide any reproducible example, you can check with iris
data:由于您没有提供任何可重现的示例,您可以检查
iris
数据:
using summarise_if
使用
summarise_if
iris %>%
summarise_if(is.numeric, quantile, 0.25)
# Sepal.Length Sepal.Width Petal.Length Petal.Width
#1 5.1 2.8 1.6 0.3
or using sapply
and select_if
(original answer):或使用
sapply
和select_if
(原始答案):
iris %>%
select_if(is.numeric) %>%
sapply(quantile, 0.25)
#Sepal.Length.25% Sepal.Width.25% Petal.Length.25% Petal.Width.25%
# 5.1 2.8 1.6 0.3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.