简体   繁体   English

使用 tidyverse 编程:速度问题

[英]Programming using the tidyverse: speed issues

We released the package quickpsy a few years ago ( paper in the R journal paper ).几年前我们发布了quickpsy包( R 期刊论文中的论文)。 The package used R base functions, but also made an extensive use of functions of what was called at that time the Hadleyverse.该包使用了 R 基本函数,但也广泛使用了当时称为 Hadleyverse 的函数。 We are now developing a new version of the package that mostly uses functions from the tidyverse and that incorporates the new non-standard evaluation approach and found that the package is much much slower (more than four times slower).我们现在正在开发一个新版本的包,它主要使用 tidyverse 中的函数,并结合了新的非标准评估方法,发现包慢得多(慢了四倍多)。 We found for example that purrr::map is much slower than dplyr::do (which is deprecated):例如,我们发现 purrr::map 比 dplyr::do (已弃用)慢得多:

library(tidyverse)

system.time(
  mtcars %>% 
    group_by(cyl) %>% 
    do(head(., 2))
  )

system.time(
  mtcars %>% 
    group_by(cyl) %>% 
    nest() %>% 
    mutate(temp = map(data, ~head(., 2))) %>% 
    unnest(temp)
)

We also found that functions like pull are very slow.我们还发现像pull这样的函数非常慢。

We are not sure whether the tidyverse is not meant to be used for this type of programming or we are not using it properly.我们不确定 tidyverse 是否不适合用于此类编程,或者我们没有正确使用它。

slice() is the proper tool to use if you want the first two rows of each group.如果您想要每组的前两行, slice()是合适的工具。 Both do() and nest() %>% mutate(map()) %>% unnest() are too heavy and use more memory: do()nest() %>% mutate(map()) %>% unnest()都太重并且使用更多内存:

library(dplyr, warn.conflicts = FALSE)
library(tidyr)
library(purrr)

library(tidyverse)

system.time(
  mtcars %>% 
    group_by(cyl) %>% 
    do(head(., 2))
)
#>    user  system elapsed 
#>   0.065   0.003   0.075

system.time(
  mtcars %>% 
    group_by(cyl) %>% 
    nest() %>% 
    mutate(temp = map(data, ~head(., 2))) %>% 
    unnest(temp)
)
#>    user  system elapsed 
#>   0.024   0.000   0.024

system.time(
  mtcars %>% 
    group_by(cyl) %>% 
    slice(1:2)
)
#>    user  system elapsed 
#>   0.002   0.000   0.002

Created on 2018-10-23 by the reprex package (v0.2.1.9000)reprex 包(v0.2.1.9000) 于 2018 年 10 月 23 日创建

See also benchmark results in this tidyr issue另请参阅此tidyr 问题中的基准测试结果

For this particular example, the slowness caused by the nest and unnest computations can be solved using group_modify对于此特定示例中,缓慢引起nestunnest计算可以使用解决group_modify

system.time(
   mtcars %>% 
   group_by(cyl) %>% 
   group_modify(~head(., 2))
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM