简体   繁体   中英

Fastest way to use if/else statements when looping through dataframe with pandas

I am trying to run conditional statements when iterating through pandas df rows and it results with a very slow code. For example:

for i, row in df.iterrows():
            # transform date column
            if len(df.loc[i, 'date']) == 7:
                df.loc[i, 'date'] = '0' + df.loc[i, 'date']

The df is only about 40k rows long and it's very slow, as this is only one of the statements I am trying to incorporate with this loop. Can you help with a faster way to do such a loop?

Thank you!

Locate the relevant rows and modify them:

df.loc[df["date"].str.len() == 7, "date"] = "0" + df.loc[df["date"].str.len()== 7, "date"]

Series.mask() and Series.where() can also be useful for if/else problems.

  • mask() will replace elements satisfying cond with other :

     df.date = df.date.mask( cond=df.date.str.len() == 7, other='0' + df.date)
  • where() will replace elements not satisfying cond with other , so we can get the same result by flipping the condition from == 7 to != 7 :

     df.date = df.date.where( cond=df.date.str.len(),= 7. other='0' + df.date)

But for pure performance, loc[] is slightly faster:

计时

There are two ways of accomplishing this, easily enough.

The first option would be to use a .apply function, in the following way:

def fix_date(row):
    return row['date'] if len(row['date']) != 7 else '0' + row['date']

df['date'] = df.apply(fix_date, axis=1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM