简体   繁体   English

带嵌套循环的R中的模拟运行缓慢

[英]simulation in R with nested loops run slow

I am using R for agent-based historical simulation and the code works but slowly. 我正在使用R进行基于代理的历史模拟,并且代码可以正常运行,但是运行缓慢。 It loops through timesteps updating a dataframe of attributes of agents, and another with summary of overall state after each timestep (a generation). 它遍历时间步长,更新代理属性的数据框,并在每个时间步长(一代)后更新总体状态摘要。 Looping above that are a few runs of each different parameter setting. 上面的循环是每个不同参数设置的几次运行。 Though it begins with 100 agents, under extreme settings (high S, low A) after eg five generations the population can grow above a thousand. 尽管它以100个代理开始,但在极端情况下(高S,低A),例如经过5代后,人口可能增长到1000以上。 I read that updating a matrix is faster than dataframe so I converted summary to a matrix. 我读到更新矩阵要比数据帧更新快,所以我将汇总转换为矩阵。 But I also hear that vectorisation is best so before I change agents to matrix I wonder if anyone please could suggest a way to make it more vectorised? 但是我也听说矢量化是最好的,所以在我将代理更改为矩阵之前,我想知道是否有人可以建议一种使它更加矢量化的方法? Here is the code: 这是代码:

NextGeneration <- function(agent, N, S, A) {
   # N is number of agents.
   # S is probability that an agent with traditional fertility will have 2 sons surviving to the age of inheritance.
   # A is probability that an heir experiencing division of estate changes his fertility preference from traditional to planned.
   # find number of surviving heirs for each agent
   excess <- runif(N)  # get random numbers 
   heir <- rep(1, N)  # everyone has at least 1 surviving heir 

   # if agent has traditional fertility 2 heirs may survive to inherit
   heir[agent$fertility == "Trad" & excess < S] <- 2  

   # next generation more numerous if spare heirs survive

   # new agents have vertical inheritance but also guided variation. 
   # first append to build a vector, then combine into new agent dataframe  
   nextgen.fertility <- NULL
   nextgen.lineage <- NULL

   for (i in 1:N) {

      if (heir[i]==2) {

         # two agents inherit from one parent.
         for (j in 1:2) {

            # A is probability of inheritance division event affecting fertility preference in new generation.
            if (A > runif(1)) {
               nextgen.fertility <- c(nextgen.fertility, "Plan") 
            } else {
               nextgen.fertility <- c(nextgen.fertility, agent$fertility[i])
            }
            nextgen.lineage <- c(nextgen.lineage, agent$lineage[i])
         }
      } else {
         nextgen.fertility <- c(nextgen.fertility, agent$fertility[i])
         nextgen.lineage <- c(nextgen.lineage, agent$lineage[i])
      }
   }
   # assemble new agent frame  
   nextgen.agent <- data.frame(nextgen.fertility, nextgen.lineage, stringsAsFactors = FALSE) 
   names(nextgen.agent) <- c("fertility", "lineage")
   nextgen.agent
}

So the agents begin like this (Trad = traditional): 因此,代理开始是这样的(Trad =传统):

ID      fertility   lineage,
1       Trad        1
2       Trad        2
3       Trad        3
4       Trad        4
5       Trad        5

and after a few timesteps (generations) of random changes end up something like this: 经过几步(几代)的随机变化后,最终结果如下:

ID      fertility   lineage
1       Plan       1
2       Plan       1
3       Trad       2
4       Plan       3
5       Trad       3
6       Trad       4
7       Plan       4
8       Plan       4
9       Plan       4
10      Plan       5
11      Trad       5

Indeed, it would be more efficient to encode fertility with 0 and 1, and you could even have an integer matrix. 确实,用0和1编码fertility会更有效,甚至可以使用整数矩阵。

Anyhow, the code as it stands can be simplified a lot - so here is a vectorized solution, still using your data.frame : 无论如何,目前的代码可以大大简化-所以这是一个矢量化的解决方案,仍然使用您的data.frame

NextGen <- function(agent, N, S, A) {
  excess <- runif(N)
  v1 <- which(agent$fertility == "Trad" & excess < S)
  nextgen.agent <- agent[c(1:N, v1), ]
  nextgen.agent[c(v1, seq.int(N+1, nrow(nextgen.agent))), "fertility"] <- ifelse(A > runif(length(v1)*2), "Plan", "Trad")
  nextgen.agent
}

Testing with a sample agent DF as follows: 使用样本agent DF进行以下测试:

agentDF <- data.frame(fertility = "Trad", lineage = 1:50, stringsAsFactors = FALSE)

# use microbenchmark library to compare performance
microbenchmark::microbenchmark(
  base = {
    res1 <- NextGeneration(agentDF, 50, 0.8, 0.8) # note I fixed the two variable typos in your function
  }, 
  new = {
    res2 <- NextGen(agentDF, 50, 0.8, 0.8)
  }, 
  times = 100
)

## Unit: microseconds
## expr      min        lq     mean    median       uq       max neval
## base 1998.533 2163.8605 2446.561 2222.8200 2286.844 14413.173   100
##  new  282.032  304.1165  329.552  320.3255  348.488   467.217   100

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM