简体   繁体   English

是否可以使用 dplyr 过滤 dataframe 与 output 创建的 Z20826A3CB51D6BFE5CF9C2 中的汇总?

[英]Is it possible with dplyr to filter a dataframe with output created by summarize within one pipe?

I got a dataframe with one numerical value and one 5 level factor variable.我得到了一个 dataframe 一个数值和一个 5 级因子变量。

# set seed for reproducibility
set.seed(123)
df <- tibble(group = rep(c("a", "b", "c", "d", "e"), each = 20),
             values = c(rnorm(20, 0, 1), rnorm(20, 1, 1), rnorm(20, 2, 1),
                        rnorm(20, 3, 1), rnorm(20, 4, 1)))

I want to use summarize to get the quantiles like我想使用汇总来获取分位数,例如

df %>% 
  group_by(group) %>%
  summarize(quantiles = quantile(values, c(0.25, 0.75))) 


df %>% 
  group_by(group) %>%
  summarize(quantile0.25 = quantile(values, c(0.25)), 
            quantile0.75 = quantile(values, c(0.75)))

Either one of these.这些中的任何一个。 I don't know which would be more practical, getting the quantiles per one row with two variables or two rows as one variable.我不知道哪个更实用,每行获取分位数,两个变量或两行作为一个变量。

And finally i want (preferably in the same pipe) use the quantiles to filter for outliers in the original dataframe, not the summarize dataframe, in each respective group, like最后我想(最好在同一个管道中)使用分位数来过滤原始 dataframe 中的异常值,而不是总结 dataframe,在每个相应的组中,比如

df %>% 
  group_by() %>%
  summarize() %>%
  filter()

where each group is filtered by their respective quantiles+-1,5IQR.其中每个组都由它们各自的分位数+-1,5IQR 过滤。

Is this possible, what would be the best approach?这可能吗,最好的方法是什么? I think it would be straightforward to filter by group with one filter value that gets applied to all groups, but how do I apply a different filter value for each group?我认为使用一个应用于所有组的过滤器值按组过滤会很简单,但是如何为每个组应用不同的过滤器值?

You can write a function to detect outliers via IQR您可以编写 function 通过 IQR 检测异常值

is_iqr_outlier <- function(x) {
   q <- quantile(x, c(0.25, 0.75))
   iqr <- diff(q)
   (x < q[1] - 1.5*iqr) | (x > q[2] + 1.5*iqr)
}

And then you can just use that in the filter然后你可以在过滤器中使用它

df %>% 
  group_by(group) %>%
  filter(!is_iqr_outlier(values))

The filter will operate by group.过滤器将按组操作。 Your sample data doesn't seem to have any outliers so it's not a great test case.您的样本数据似乎没有任何异常值,因此它不是一个很好的测试用例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM