Conditional Formatting using Pandas Dataframe

Question

I am working with a pandas dataframe. In this dataframe I have two columns one is enrollment (e_gk) and the other one is attendence (a_gk), there are some errors in data where attendence is high than actual enrollment. I want to replace the values of attendence with actual enrollment in such errors.

My main code line for this condition. Here in iterations 'e' is for enrollment and 'a' for attendence.

df['a_gk'] = [e if a > e else a for a, e in df.a_gk and df.e_gk]

this gives me the following error:

"ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()"

Answer 1

The problem is and operator doesn't support Series operation in df.a_gk and df.e_gk . You may want zip two columns together.

df['a_gk'] = [e if a > e else a for a, e in zip(df.a_gk, df.e_gk)]

But you could also use apply on rows.

df['a_gk'] = df.apply(lambda row: row['e_gk'] if row['a_gk'] > row['e_gk'] else row['e_gk'], axis=1)

Or with np.where

df['a_gk'] = np.where(df['a_gk'] > df['e_gk'], df['e_gk'], df['a_gk'])

Conditional Formatting using Pandas Dataframe

Question

1 answers

solution1
0 ACCPTED 2021-04-30 09:03:32

Conditional Formatting using Pandas Dataframe

Question

1 answers

solution1 0 ACCPTED 2021-04-30 09:03:32

solution1
0 ACCPTED 2021-04-30 09:03:32