简体   繁体   English

模拟新行,Python比SAS慢得多,如何加快速度?

[英]Simulate new rows, Python is much slower than SAS, How to speed up?

Here is the problem and it is for a model implementation task. 这是问题所在,是针对模型实现任务的。 Given I have some data. 鉴于我有一些数据。 I need to simulate some new data,some of the variable values are dependent on the values of previous row and a random number r. 我需要模拟一些新数据,其中一些变量值取决于上一行的值和随机数r。

For example, say I have 例如说我有

AsOfDate  Var1  Var2    r
6/4/2018    A   0.3     0.2

Creating two rows, the output would be 创建两行,输出将是

AsOfDate    Var1    Var2    r
6/4/2018    A       0.3    0.2
6/5/2018    B       0.06   0.95
6/6/2018    A       0.057

The logic is, as of 6/4 r=0.2 less than Var2, then as of 6/5 Var1=B, Var2=0.3*0.2=0.06. 逻辑上,从6/4 r = 0.2开始小于Var2,然后从6/5 Var1 = B开始,Var2 = 0.3 * 0.2 = 0.06。 As of 6/5, r=0.95 greater than Var2, then as of 6/6 Var1=A, Var=0.06*0.95=0.057. 从6/5开始,r = 0.95大于Var2,然后从6/6开始,Var1 = A,Var = 0.06 * 0.95 = 0.057。

I apologize if I confuse you. 如果您感到困惑,我深表歉意。 But I'm trying my best to describe this. 但我正在尽力描述这一点。 I can't think of a way that I can do this without using a for loop. 我想不出不使用for循环就可以做到这一点的方法。 I ran the following simple SAS and Python code just to compare the speed. 我运行以下简单的SAS和Python代码只是为了比较速度。 To my supprise, python/Pandas is much lower than SAS datastep. 令我惊讶的是,python / Pandas比SAS datastep低得多。 I'm no expert in Python, so I'm wondering if there is better way to do this and make it runs much faster. 我不是Python专家,所以我想知道是否有更好的方法可以做到这一点并使它运行得更快。 Thanks in advance for your help. 在此先感谢您的帮助。

a=pd.DataFrame(data={'id':[1],'val':[2]})
tick=time.time()
n=0
b=pd.DataFrame()
for n in range(10000):
    a['id']=a['id']+1
    a['val']=a['val']+(n+1)
    b=pd.concat([b,a])
tock=time.time()
print(tock-tick)

time took: 7.54027533531189 sec 花费时间:7.54027533531189秒

data test;
input id val;
datalines;
1 2
;
run;

%let _timer_start = %sysfunc(datetime());

data test(drop=i);
    set test;
    do i=1 to 10000;
        id=id+1;
        val=val+(i+1);
        output;
    end;
run;

data _null_;
  dur = datetime() - &_timer_start;
  put 30*'-' / ' TOTAL DURATION:' dur time13.2 / 30*'-';
run;

time took: 0.01 sec 耗时:0.01秒

The most straightforward answer is: because you have chosen the most inefficient way : ) 最直接的答案是:因为您选择了效率最低的方法:)

Ie this code (not really optimised): 即此代码(未真正优化):

import time
import pandas as pd
tick=time.time()
n=0
a = {
    'id': 1,
    'val': 2,
}
data = []
for n in range(10000):
    a['id'] = a['id']+1
    a['val'] = a['val']+(n+1)
    data.append([a['id'], a['val']])
df = pd.DataFrame(data, columns=['id', 'val'])
tock=time.time()
print(tock-tick)

Does the same (unless i made some stupid mistake) few hundred times faster, probably not much slower than sas. 相同的速度(除非我犯了一些愚蠢的错误)要快几百倍,可能不会比sas慢很多。

if you can, build your data outside pandas 如果可以的话,请在熊猫以外建立资料

from itertools import accumulate
a = list(range(2, 10002))
b = [2+i for i in accumulate(range(1,10001))]
df = pd.DataFrame(data={'id':a,'val':b})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 GPU 上的训练比在 CPU 上慢得多 - 为什么以及如何加快速度? - Training on GPU much slower than on CPU - why and how to speed it up? 我怎样才能加速 xarray 重采样(比熊猫重采样慢得多) - How can I speed up xarray resample (much slower than pandas resample) 在新进程中执行python代码比在主进程上慢得多 - Executing python code in new process is much slower than on main process 多处理的数值模拟比希望慢得多:我做错了吗? 我可以加快速度吗? - Numerical simulations with multiprocessing much slower than hoped: am I doing anything wrong? Can I speed it up? 为什么readline()比Python中的readlines()慢得多? - why readline() is much slower than readlines() in Python? 为什么在Python中嵌套“ if”比并行“ and”要慢得多? - Why is nested `if` in Python much slower than parallel `and`? Python list pop()比list [1:]慢得多 - Python list pop() much slower than list[1:] sign()在python中比matlab慢得多? - sign() much slower in python than matlab? Ray 比 Python 和.multiprocessing 都慢得多 - Ray is much slower both than Python and .multiprocessing Python代码的工作速度比Java慢20倍。 有没有办法加快Python的速度? - Python code works 20 times slower than Java. Is there a way to speed up Python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM