简体   繁体   English

如何加快此 python 代码中的求和过程?

[英]how can I speed up the process of summing values in this python code?

I have df.Ah, column of a dataframe, which has either positive values or zeros.我有 df.Ah,dataframe 的列,它有正值或零。 I want to store into one another dataframe the sum of values before a zero and follow that with one another zero.我想将零之前的值之和存储到另一个 dataframe 中,然后再存储一个零。

for example: df.Ah = [ 1,2,3,0,46,0,24,1], gives as out put [6,0,46,0,25].例如:df.Ah = [ 1,2,3,0,46,0,24,1],输出 [6,0,46,0,25]。

my attempt was:我的尝试是:

lst = df.Ah;
lst1 = pd.DataFrame(index=np.arange(100000000)); #if I replace values of a pre-existing column code #speeds up


summ = 0;
for i , elem in enumerate(lst):
     
    if elem != 0:
        summ = summ + elem;
    else:
        if summ:
            lst1.loc[i,'Ah']=sum;
            
        lst1.loc[i,'Ah']=elem;
        summ = 0;
...

if summ:
    lst1.iloc[i+1, 0] = summ;

#If I print the index i for each loop , it generates 100.000 prints for each minute;
# which means it would take around five hours to complete checking 31 milion values
# of my dataframe and I don't have all that time for this basic operation.

is there a way to speed up this code?有没有办法加快这段代码?

You can use the cumsum method from Pandas to calculate the cumulative sum of the values in df.Ah and then create a new DataFrame using the indices where the cumulative sum is 0 as the index.可以使用Pandas中的cumsum方法计算df.Ah中的值的累加和,然后以累加和为0的索引为索引新建DataFrame。 This will avoid the need to use a loop, which can be slow when working with large DataFrames.这将避免使用循环的需要,循环在处理大型 DataFrame 时可能会很慢。

Here's an example of how you could do this:这是您如何执行此操作的示例:

# Calculate the cumulative sum of values in df.Ah
df['cumulative_sum'] = df.Ah.cumsum()

# Create a new DataFrame using the indices where the cumulative sum is 0 as the index
df1 = pd.DataFrame(df.cumulative_sum.values, index=df.index[df.cumulative_sum == 0], columns=['Ah'])

# Append a 0 to the end of the new DataFrame if necessary
if df1.iloc[-1] != 0:
    df1 = df1.append(pd.DataFrame([0], columns=['Ah']))

This should be much faster than using a loop, since the cumsum method is implemented in C and can be much faster than a Python loop.这应该比使用循环快得多,因为 cumsum 方法是在 C 中实现的,并且可以比 Python 循环快得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM