how to compare datetimes in a pandas dataframe

Question

I got three columns with date informations, which indicate events that need to happen in a particular order and I would like to check if for any row in the dataframe the order is incorrect.

I prepared each column with pd.to_datetime()

Lets say the rule should be column a < b < c , so I tried this:

count = 0
for idx, _ in df.iterrows():
    if df.loc[idx, 'a'] > df.loc[idx, 'b']:
        print(f"Invalid b in line {idx}")
        print(f"{df.loc[idx, 'a']} {df.loc[idx, 'b']}")
        drop_rows.append(idx)
        count+=1
    if df.loc[idx, 'b'] > df.loc[idx, 'c']:
        print(f"Invalid c in line {idx}") 
        drop_rows.append(idx)
        count+=1
print(f"{count} invalid rows")

And it works for almost all rows, but for 36 (correct) rows I still receive something like the following

Invalid b in line 5883 2014-03-06 00:00:00 2014-03-06 00:00:00
Invalid b in line 24442 2011-11-14 00:00:00 2011-11-14 00:00:00

I also changed if df.loc[idx, 'a'] > df.loc[idx, 'b']: by if not df.loc[idx, 'a'] <= df.loc[idx, 'b']: but still receiving this correct entries as wrong.

Why does python think this are not the same dates and how could I change that?

Also is there a faster way to get through the dataframe than iterrows?

Answer 1

You don't necessarily need to iterate (potentially slowly) through your DataFrame rows, you could just filter the DataFrame to all rows which meet either condition, like so:

abc_errors = df.loc[(df['a'] > df['b']) | (df['b'] > df['c'])]

Alternatively you can filter to ab errors and bc errors separately:

ab_errors = df.loc[(df['a'] > df['b'])] 
bc_errors = df.loc[(df['b'] > df['c'])]

how to compare datetimes in a pandas dataframe

Question

1 answers

solution1
1 ACCPTED 2020-07-16 19:34:35

how to compare datetimes in a pandas dataframe

Question

1 answers

solution1 1 ACCPTED 2020-07-16 19:34:35

solution1
1 ACCPTED 2020-07-16 19:34:35