更快地在 Pandas 中循环

Question

I need to make a loop in pandas faster.我需要更快地在熊猫中进行循环。 It's a time series.这是一个时间序列。 Below code works pretty well but it is slow for massive df.下面的代码工作得很好，但对于大量 df 来说速度很慢。

It iterates through a df and at each first value 0 'zero' of column A (it needs to be only the first zero of a serie; df has many 0 series) calculates the delta (in absolute value) of column B values at one period before and after of the initial value 0 'zero' of column A. Then it stores the results in a new df with column called 'Delta'它遍历 df 并且在 A 列的每个第一个值 0“零”（它只需要是系列的第一个零；df 有许多 0 系列）计算列 B 值的增量（绝对值）为一个列 A 的初始值 0 '零' 之前和之后的周期。然后它将结果存储在一个新的 df 中，列名为 'Delta'

I bet I can do something with loc.我打赌我可以用 loc 做点什么。 but I cannot figure out how.但我不知道怎么做。

deltas=[]
indexes = []
i=0
for idx, row in df.iterrows():

    if df.A[i] == 0 and df.A[i-1] !=0:
    
        deltas.append(abs(df.B.shift(periods=1)[i] - df.B.shift(periods=-1)[i]))
        indexes.append(idx)
        
    i+=1
s_delta = pd.Series(deltas, name="Delta", index = indexes)
df_delta = s_delta.to_frame()

Answer 1

Use assign function to process df in series not per row:使用分配函数来处理 df 系列而不是每行：

df = df.assign(
    n = lambda x: x.B.shift(1),
    p = lambda x: x.B.shift(-1),
    s_delta= np.abs(x.n-x.p)
)

Then you can modify it using np.where然后你可以使用 np.where 修改它

更快地在 Pandas 中循环

问题描述

1 个解决方案

解决方案1
1 2020-11-02 17:48:46

更快地在 Pandas 中循环

问题描述

1 个解决方案

解决方案1 1 2020-11-02 17:48:46

解决方案1
1 2020-11-02 17:48:46