简体   繁体   中英

How to replace a pandas column row with the previous row if a condition is met

I'm trying to speed up my trading strategy backtesting.

Right now, I have

for i in trange(1, len(real_choice), disable=not backtesting, desc="Converting HOLDs and calculating backtest correct/incorrect... [3/3]"):
      if (advice[i] == "HOLD"):
        advice[i] = advice[i-1]
      if (real_choice[i] == "HOLD"):
        real_choice[i] = real_choice[i-1]

      if advice[i] == real_choice[i]:
        correct[i] = "CORRECT"
      else:
        correct[i] =  "INCORRECT"

This part of the code takes the longest, so I want to speed it up.

I'm learning Python so this was simple and worked but now I'm paying for it with how long the backtests take.

Is there a way to do this faster?

you can use np.where to compare two columns and assign a value to those rows

correct = np.where( advice == real_choice
                     , "CORRECT", "INCORRECT)

but to make it look more pandas it would be

df['correct'] = np.where( df['advice'] == df['real_choice']
                     , "CORRECT", "INCORRECT)

with some time comparisons (Full Code)

A = randint(0, 10, 10000)

B = randint(0, 10, 10000)

df = pd.DataFrame({'A': A, 'B':B, 'C': "INCORRECT"})
print(df)


start = time.process_time()
for i in range(0, len(real_choice)):
      if df['A'][i] == df['B'][i]:
        df['C'][i] = "CORRECT"
      else:
        df['C'][i] =  "INCORRECT"
print("method 1", time.process_time() - start)


start = time.process_time()
df['C2'] = np.where( df['A'] == df['B'], "CORRECT", "INCORRECT")
print("method 2", time.process_time() - start)

method 2 took a shorter amount of time to compute

method 1 1.0530679999999997
method 2 0.0022619999999999862

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM