Fastest way to use if/else statements when looping through dataframe with pandas

Question

I am trying to run conditional statements when iterating through pandas df rows and it results with a very slow code. For example:

for i, row in df.iterrows():
            # transform date column
            if len(df.loc[i, 'date']) == 7:
                df.loc[i, 'date'] = '0' + df.loc[i, 'date']

The df is only about 40k rows long and it's very slow, as this is only one of the statements I am trying to incorporate with this loop. Can you help with a faster way to do such a loop?

Thank you!

Answer 1

Locate the relevant rows and modify them:

df.loc[df["date"].str.len() == 7, "date"] = "0" + df.loc[df["date"].str.len()== 7, "date"]

Answer 2

Series.mask() and Series.where() can also be useful for if/else problems.

mask() will replace elements satisfying cond with other :

 df.date = df.date.mask( cond=df.date.str.len() == 7, other='0' + df.date)

where() will replace elements not satisfying cond with other , so we can get the same result by flipping the condition from == 7 to != 7 :
```
 df.date = df.date.where( cond=df.date.str.len(),= 7. other='0' + df.date)
```

But for pure performance, loc[] is slightly faster:

Answer 3

There are two ways of accomplishing this, easily enough.

The first option would be to use a .apply function, in the following way:

def fix_date(row):
    return row['date'] if len(row['date']) != 7 else '0' + row['date']

df['date'] = df.apply(fix_date, axis=1)

Fastest way to use if/else statements when looping through dataframe with pandas

Question

3 answers

solution1
2 ACCPTED 2021-01-16 16:40:45

solution2
0 2021-06-20 08:47:06

solution3
-1 2021-01-16 16:33:51

Fastest way to use if/else statements when looping through dataframe with pandas

Question

3 answers

solution1 2 ACCPTED 2021-01-16 16:40:45

solution2 0 2021-06-20 08:47:06

solution3 -1 2021-01-16 16:33:51

solution1
2 ACCPTED 2021-01-16 16:40:45

solution2
0 2021-06-20 08:47:06

solution3
-1 2021-01-16 16:33:51