I would like to vectorize this piece of python code with for loop conditioned on current state for speed and efficiency.
values for df_B are computed based on current-state ( state
) AND corresponding df_A value.
Any ideas would be appreciated.
import pandas as pd
df_A = pd.DataFrame({'a': [0, 1, -1, -1, 1, -1, 0, 0] ,})
df_B = pd.DataFrame( data=0, index=df_A.index, columns=['b'])
print(df_A)
state = 0
for index, iter in df_A.iterrows():
if df_A.loc[index ,'a'] == -1:
df_B.loc[index ,'b'] = -10 -state
elif df_A.loc[index, 'a'] == 1:
df_B.loc[index, 'b'] = 10 - state
elif df_A.loc[index, 'a'] == 0:
df_B.loc[index, 'b'] = 0 - state
temp_state = state
state += df_B.loc[index, 'b']
print(df_B)
This seems overkill. Your state
variable basically is the previous value in df_A['a']*10
. So we can just use shift
:
s = df_A['a'].mul(10)
df_B['b'] = s - s.shift(fill_value=0)
You can make a class where state
is a class variable. This will allow you to write a function which can be fed to an apply
statement. This isn't a vectorized solution, but it is faster than iterrows
. For example:
class ComputeB:
def __init__(self, state=0):
self.state = state
def compute_b(self, row):
row["b"] = row["a"]*10 - self.state
self.state += row["b"]
return row
df = pd.concat([df_A, df_B], axis = 1)
cb = ComputeB()
df = df.apply(lambda row: cb.compute_b(row), axis = 1)
And now df["b"]
contains the values you wanted to compute. This does assume that df_A["a"]
can only contain 0, 1 and -1. On my machine with a column of 40000 values the approach in the question took 10.4 seconds and this approach took 2.95 seconds.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.