简体   繁体   English

R for 循环与 Python for 循环性能

[英]R for-loop vs Python for-loop Performance

There already is some discussion on this topic but they don't quite address my question.已经有一些关于这个话题的讨论,但他们并没有完全解决我的问题。 Sorry in advance if they do and I didn't realize.如果他们这样做并且我没有意识到,请提前道歉。

Here are two simple for-loop setups in R and python -这是 R 和 python 中的两个简单的 for 循环设置 -

R for-loop (took 3.41s on my computer): R for-loop(在我的电脑上花了 3.41 秒):

datafr  <- matrix(0,nrow=24*365,ncol=15)
matrix3d  <- array(0,dim=c(24*365,12,7))

#================
start_time <- Sys.time()
for (p in 1:150) {
  for (m in 1:2) {
    l  <- rep(0.7*runif(365),each=24)
    a  <- rep(0.7*runif(365),each=24)
    pp <- 1+floor(15*runif(7))
    for (j in 1:7) {
      bun     <- datafr[,pp[j]]*a
      for (h in 2:(24*365)) {
        matrix3d[h,m,j] <- matrix3d[h-1,m,j]*l[h] + bun[h]
      }  
    }
  }
}
Sys.time() - start_time
#================
#took 3.41s on my computer

And here's the same code in Python (#took 17.87s on my computer):这是 Python 中的相同代码(#took 17.87s on my computer):

import numpy as np
import time
import pandas as pd

datafr= pd.DataFrame(0, index=range(24*365),columns=range(15))
matrix3d = np.zeros((24*365,12,7))

#=============
start_time = time.time()
for p in range(150):
    for m in range(2):
        l = np.repeat(0.7*np.random.random(365),24)
        a = np.repeat(0.7*np.random.random(365),24)
        pp = 1+np.floor(15*np.random.random(7))
        for j in range(7):
            bun = np.asarray(datafr.iloc[:,int(pp[j])-1],dtype=np.float32)*a
            for h in range(1,(24*365)):
                matrix3d[h,m,j] = matrix3d[h-1,m,j]*l[h]+bun[h] #bottleneck
round(time.time() - start_time,2)
#================
#took 17.87s on my computer

R is over 5 times faster than Python. R 比 Python 快 5 倍以上。 Is this to be expected?这是可以预料的吗? I saw that Python's for-loop is faster than R's, unless you use R's lapply in which case R beats Python if the number of steps is greater than 1000 ( https://datascienceplus.com/loops-in-r-and-python-who-is-faster/ ), but that is not what I see here (I'm not using lapply). I saw that Python's for-loop is faster than R's, unless you use R's lapply in which case R beats Python if the number of steps is greater than 1000 ( https://datascienceplus.com/loops-in-r-and-python -who-is-faster/ ),但这不是我在这里看到的(我没有使用 lapply)。 Can the Python script be improved in a way that doesn't use decorators or magic functions or generators etc? Python 脚本能否以不使用装饰器或魔术函数或生成器等的方式进行改进? I'm simply curious.我只是好奇。 Thanks谢谢

R loops used to be slow during 2014 or 15. They aren't slow anymore software's and programming language evolve over time and things are never true forever. R 循环在 2014 年或 15 年曾经很慢。它们不再慢软件和编程语言随着时间的推移而发展,事情永远不会永远正确。 JS is a perfect example of this. JS 就是一个很好的例子。

R for loops are not slow and you can use them anytime you want however the garbage collector of R is slow and you shouldn't grow a vector inside a loop which copies it multiple time. R for 循环并不慢,你可以随时使用它们,但是 R 的垃圾收集器很慢,你不应该在循环内增长一个向量,它会多次复制它。 If you avoid that part you are almost always in safe hands如果你避开那部分,你几乎总是安全的

And you could also try set method from data.table if you need more speed from loop or parallelize it如果您需要更高的循环速度或并行化,您也可以尝试从 data.table 设置方法

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM