如何分析R软件包的加载

Question

I have written this R package that takes ages (> 10s, sometimes up to 20-30s!) to load. 我已经写了这个R包，需要花费很多时间（> 10s，有时长达20-30s！）来加载。

Every time the package loads, such as when building the package at the step "** testing if installed package can be loaded" , or directly calling library("my.package") , nothing happens for 20s. 每次加载软件包时，例如在"** testing if installed package can be loaded"步骤中构建软件包或直接调用library("my.package")时，20 library("my.package")都不会发生任何事情。

This makes everything painfully slow during development: building documentation, building the package, running R check... 这使开发过程中的所有事情都非常缓慢：构建文档，构建软件包，运行R检查...

Of course I have my suspicions (looking at you, dodgy dependency), but I need to gather evidence before axing it. 我当然有怀疑（看着你，狡猾的依赖），但是我需要在消除之前收集证据。

Is there a way to profile the loading of the package, in order to identify the cause? 有没有一种方法可以分析软件包的加载情况，以便找出原因？ Or more generally, how can I figure out what is happening under the hood? 或更笼统地说，我如何弄清楚到底发生了什么？

Answer 1

So an issue with using the detach method from @davide-lorino is that if there are entangled depends or imports , it will fail, and fail hard. 因此，使用@ davide-lorino中的detach方法存在的一个问题是，如果存在纠缠的depends或imports ，它将失败并且很难失败。

A better method is to use a future backend that loads each of the import s in a clean R session, and time how long it takes to load them via library . 更好的方法是使用future后端在干净的R会话中加载每个import ，并确定通过library加载它们需要花费多长时间。

I implemented this in a package that might be useful to others: https://github.com/rmflight/importedPackageTimings 我在一个可能对其他人有用的软件包中实现了这一点： https : //github.com/rmflight/importedPackageTimings

Answer 2

You can determine which library takes the longest to load by benchmarking a call to load each of the libraries that you are testing. 您可以通过基准测试调用来加载要测试的每个库，从而确定哪个库需要最长的加载时间。

The key is to make sure that you unload the libraries. 关键是确保卸载库。 If you keep the libraries loaded before re-loading them, the library() function will determine that the library has loaded and return out. 如果在重新加载库之前使它们保持加载状态，则library()函数将确定该库已加载并退出。 On a typical benchmark of 100 runs, 1 of them will represent the time it took to load your library, and the remaining 99 will represent the time it took library() to figure out that the library is loaded. 在典型的100次运行基准测试中，其中1次代表加载库所需的时间，其余99次代表library()确定加载库所需的时间。 The result (duration) will then be an aggregate of the 100 runs, yielding a very small number and almost no variance between the results, like so: 然后，结果（持续时间）将是100次运行的总和，得出的结果很小，结果之间几乎没有差异，如下所示：

When what you really want looks more like: 当您真正想要的东西看起来像：

Giving us a less surprising result for our efforts. 为我们的努力提供了不那么令人惊讶的结果。

Ps the detach_package() function is implemented like this: ps detach_package()函数是这样实现的：

detach_package <- function(pkg, character.only = FALSE)
{
  if(!character.only)
  {
    pkg <- deparse(substitute(pkg))
  }
  search_item <- paste("package", pkg, sep = ":")
  while(search_item %in% search())
  {
    detach(search_item, unload = TRUE, character.only = TRUE)
  }
}

如何分析R软件包的加载

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-08-26 15:06:16

解决方案2
1 2019-07-25 12:11:44

如何分析R软件包的加载

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-08-26 15:06:16

解决方案2 1 2019-07-25 12:11:44

解决方案1
2 已采纳 2019-08-26 15:06:16

解决方案2
1 2019-07-25 12:11:44