简体   繁体   English

如何在 R package 中使用 function 重现循环

[英]How to reproduce a loop using a function in R package purrr

I often use loops in my code.我经常在我的代码中使用循环。 I was told that rather than using loops, I should be using functions, and that a loop can be re-written using a function in the R package purr.有人告诉我,我应该使用函数,而不是使用循环,并且可以使用 R ZEFE90A8E604A7C840E88D03A67F6B7DZr 中的 function 重写循环。

As an example the code shows just the counts of the different species in the iris dataset where the Sepal.Width < 3作为示例,代码仅显示虹膜数据集中不同物种的计数,其中 Sepal.Width < 3

 library(dplyr)
 #dataframe to put the output in
 sepaltable <- data.frame(Species=character(),
                     Total=numeric(), 
                     stringsAsFactors=FALSE) 

 #list of species to iterate over
 specieslist<-unique(iris$Species)

 #loop to populate the dataframe with the name of the species 
 #and the count of how many there were in the iris dataset

 for (i in  seq_along (specieslist)){
 a<-paste(specieslist[i])  
 b<- filter(iris,`Species`==a & Sepal.Width <=3)
 c<-nrow(b)
 sepaltable[i,"Species"]<-a
 sepaltable[i,"Total"]<-c
 }

The loop populates the sepaltable dataframe with the name of each species and how many of them there were in the iris dataset.该循环使用每个物种的名称以及鸢尾花数据集中有多少物种来填充可分离的 dataframe。 I want to reproduce the effects of this loop using a function in the R package purrr without using a loop.我想在不使用循环的情况下使用 R package purrr 中的 function 重现此循环的效果。 Can anyone help?任何人都可以帮忙吗?

We can use a group by sum of logical expression in dplyr我们可以使用dplyr中的逻辑表达式的sum来分组

library(dplyr)
iris %>% 
   group_by(Species) %>%
   summarise(Total = sum(Sepal.Width <=3))

Or if purrr is needed或者如果需要purrr

library(purrr)
map_dfr(specieslist,  ~iris %>% 
      summarise(Total = sum(Species == .x & Sepal.Width <=3),
          Species = .x )) %>%
   select(Species, Total)

NOTE: map or apply family functions ( lapply/sapply/vapply/rapply/mapply/Map/apply ) are all loops注意: mapapply系列功能( lapply/sapply/vapply/rapply/mapply/Map/apply )都是循环

For the type example you provided, akrun's answer is the most straightforward approach, especially since you are already using dplyr.对于您提供的类型示例,akrun 的答案是最直接的方法,尤其是因为您已经在使用 dplyr。 The dplyr package is written to handle basic data table summaries, especially the group statistics used in your example. dplyr package 用于处理基本数据表摘要,尤其是示例中使用的组统计信息。
But, with more complicated cases most of the time you write a loop, you could accomplish the same thing using a function and the apply family.但是,对于大多数情况下编写循环的更复杂的情况,您可以使用 function 和 apply 系列来完成同样的事情。

using your example:使用您的示例:

# write function that does the stuff you put in your loop
summSpecies <- function(a) {
      b<- filter(iris,`Species`==a & Sepal.Width <=3)
      c<-nrow(b)
      return(c)
}

# apply the loop over your list
sapply(specieslist,summSpecies) #sapply simplifies the output to return a vector (in this case)
#[1]  8 42 33

# You can build this into a data frame
sepaltable <- data.frame(Species=specieslist,
                         Total=sapply(specieslist,summSpecies), 
                         stringsAsFactors=FALSE) 
sepaltable
#      Species Total
# 1     setosa     8
# 2 versicolor    42
# 3  virginica    33

For what it's worth I did a comparison of the methods proposed in the example:对于它的价值,我对示例中提出的方法进行了比较:

Unit: microseconds
#            expr      min        lq     mean   median        uq       max neval
#      ForLoop.OP 2548.519 2725.9020 3107.153 2819.837 3006.5915 11654.194   100
#     Apply.Brian 2385.638 2534.2390 2810.854 2625.050 2822.5145  9641.172   100
#     dplyr.akrun 721.136  837.6065 1180.244  864.604  902.9815 13440.076   100
#     purrr.akrun 3572.656 3783.2845 4147.900 3874.095 4073.5690 10517.602   100
#    purrr.Axeman 2440.973 2527.322 2866.7686 2586.8960 2774.097  9577.360   100

It should be no surprise that the existing function that is optimized for this kind of task is the clear winner.毫无疑问,针对此类任务优化的现有 function 显然是赢家。 The for loop approach lags behind the apply family approach. for 循环方法落后于 apply 系列方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM