简体   繁体   English

在 R 中按特定组在 dataframe 上申请 function

[英]Apply function on dataframe by specific group in R

I have a dataframe that looks something like this:我有一个 dataframe,看起来像这样:

dist   id daytime  season 
3  1.11     Name1     day    summer   
4  2.22     Name2     night  spring   
5  3.33     Name1     day    winter   
6  4.44     Name3     night  fall  

I want of summary of dist by some specific columns in my dataframe.我想要我的 dataframe 中某些特定列的dist摘要。

So far I used a custom function:到目前为止,我使用了自定义 function:

summary <- function(x){df %>%                               
    group_by(x) %>% 
    summarize(min = min(dist),
              q1 = quantile(dist, 0.25),
              median = median(dist),
              mean = mean(dist),
              q3 = quantile(dist, 0.75),
              max = max(dist))}

And applied it to any specific column I wanted at the moment:并将其应用于我目前想要的任何特定列:

summary_ID <- path.summary(id)

I tried it a few weeks ago and would get something like this>几周前我试过了,会得到这样的东西>

  id       min    q1 median  mean    q3   max
   <chr>  <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
 1 Name1   0     17.8   310.   788. 1023. 5832.
 2 Name2   0     31.7   284.   570.  744. 9578.
 3 Name3   0     17.0   325.   721. 1185. 5293.
 4 Name4   0     11.9   197.   530.  865. 3476.
 5 Name5   0     24.5    94.9  617.  966. 9567.

When I try it now I get an error:当我现在尝试时,出现错误:

Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `x` is not found.

What changed and how do I get around the issue?发生了什么变化,我该如何解决这个问题?

Here, we may use {{}} if the input is unquoted在这里,如果输入未加引号,我们可以使用{{}}

path_summary <- function(dat, x){
  dat %>%                               
    group_by({{x}}) %>% 
    summarize(min = min(dist),
              q1 = quantile(dist, 0.25),
              median = median(dist),
              mean = mean(dist),
              q3 = quantile(dist, 0.75),
              max = max(dist))
}

-testing -测试

> path_summary(df, id)
# A tibble: 3 × 7
  id      min    q1 median  mean    q3   max
  <chr> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
1 Name1  1.11  1.66   2.22  2.22  2.78  3.33
2 Name2  2.22  2.22   2.22  2.22  2.22  2.22
3 Name3  4.44  4.44   4.44  4.44  4.44  4.44

data数据

df <- structure(list(dist = c(1.11, 2.22, 3.33, 4.44), id = c("Name1", 
"Name2", "Name1", "Name3"), daytime = c("day", "night", "day", 
"night"), season = c("summer", "spring", "winter", "fall")), 
class = "data.frame", row.names = c("3", 
"4", "5", "6"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM