简体   繁体   English

在R中,使用dplyr中的“do”函数应用一系列复杂函数

[英]In R, apply a sequence of complex functions using the “do” function in dplyr

I would like to know that if is possible apply a sequence of especial functions using pipe chaining. 我想知道如果可以使用管道链接应用一系列特殊函数。 For example, supose that I have the following data with 3 levels in group_var: 例如,假设我在group_var中有3个级别的以下数据:

head(df)
 v1  v2 .... v9  group_var
  1   2       0          1
  5   3       2          0 
  2   1       3          1 
  1   8       9          2
  7   6       0          1
  5   9       2          0 

My firs question: I wish do the following 我的第一个问题:我希望做到以下几点

res<- df %>% group_by(group_var) %>% do( out = special_function(.) )  

where the special function has diferents components functions for each group. 其中特殊功能具有每组不同的组件功能。 That is, in pseudocode, the special function is 也就是说,在伪代码中,特殊功能是

special_function = (f0,f1,f2) 

so that f1, f2, f3 are mutually diferents. 所以f1,f2,f3是相互不同的。 For example 例如

f0<- function(data) apply(data, 2, min)
f1<- function(data) t(data)
f2<- function(data) as.list(data)

and

df[df$group_var == i ,] 

is the input of the fi function for each i = 0,1,2 . 是每个i = 0,1,2fi函数的输入。 That is 那是

res$out[[1]] ==  f0(df[df$group_var == 0 ,])
> T 
res$out[[2]] ==  f0(df[df$group_var == 1 ,])
> T
res$out[[3]] ==  f0(df[df$group_var == 2 ,])
> T

My second question is related to my first question. 我的第二个问题与我的第一个问题有关。 If the response of the first question is yes, I would like apply more operations. 如果第一个问题的答案是肯定的,我想申请更多的操作。 Actually, I want use the outputs of each fi and apply operations one related to other. 实际上,我想使用每个fi的输出并应用与其他相关的操作。 For example 例如

other_special_function<- function(d1,d2,d3)
{
ret<- cbind(d1 , d2)
ret2<- cbind(ret, apply(d,2,sum))
return(ret2)
}

res2<- res %>% do(out2 = other_special_function(.) )

Which 哪一个

res2$out2 = other_special_function(res$out[[1]], res$out[[2]] , res$out[[3]] ) 

You absolutely can do this with pipes and functions; 你绝对可以用管道和功能做到这一点; it is very flexible. 它非常灵活。 If you have a function in mind that might be more complicated, as you describe, I would recommend going with purrr and map() rather than do() from dplyr. 如果你有一个可能更复杂的函数,正如你所描述的那样,我建议使用purrr和map()而不是dplyr中的do()

Let's say you have everybody's favorite iris dataset. 假设你拥有每个人最喜欢的虹膜数据集。 You can start out by using nest() from tidyr so that you have one row per group, with the data nested into mini-dataframes in a list-column. 您可以从tidyr使用nest()开始,这样每个组就有一行,数据嵌套在list-column中的迷你数据帧中。

library(dplyr)
library(tidyr)

iris %>% 
    nest(-Species) 

#> # A tibble: 3 × 2
#>      Species              data
#>       <fctr>            <list>
#> 1     setosa <tibble [50 × 4]>
#> 2 versicolor <tibble [50 × 4]>
#> 3  virginica <tibble [50 × 4]>

Then say you have some arbitrary function that you want to apply to each little mini-dataframe, and it is different depending on what the species is. 然后说你有一些任意函数想要应用于每个小的迷你数据帧,它根据物种的不同而不同。 This could be as complicated as you like. 这可能会像你喜欢的那样复杂。

fun <- function(df, species) {
    if (species == "setosa") {
        t(df)
    } else {
        df
    }
} 

You can use map2() from purrr to apply this function to each species/data pair, and you can put it in a new column in the data frame, and keep going, if you'd like. 您可以使用purrr中的map2()将此函数应用于每个物种/数据对,您可以将其放在数据框的新列中,如果您愿意,可以继续使用。

library(purrr)

iris %>% 
    nest(-Species) %>%
    mutate(output = map2(data, Species, fun))

#> # A tibble: 3 × 3
#>      Species              data            output
#>       <fctr>            <list>            <list>
#> 1     setosa <tibble [50 × 4]>    <dbl [4 × 50]>
#> 2 versicolor <tibble [50 × 4]> <tibble [50 × 4]>
#> 3  virginica <tibble [50 × 4]> <tibble [50 × 4]>

This is a flexible technique for applying arbitrary operations to data, and is especially great for modeling . 这是一种将任意操作应用于数据的灵活技术,尤其适用于建模

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM