简体   繁体   中英

Conditional Formatting using Pandas Dataframe

I am working with a pandas dataframe. In this dataframe I have two columns one is enrollment (e_gk) and the other one is attendence (a_gk), there are some errors in data where attendence is high than actual enrollment. I want to replace the values of attendence with actual enrollment in such errors.

My main code line for this condition. Here in iterations 'e' is for enrollment and 'a' for attendence.

df['a_gk'] = [e if a > e else a for a, e in df.a_gk and df.e_gk]

this gives me the following error:

"ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()"

The problem is and operator doesn't support Series operation in df.a_gk and df.e_gk . You may want zip two columns together.

df['a_gk'] = [e if a > e else a for a, e in zip(df.a_gk, df.e_gk)]

But you could also use apply on rows.

df['a_gk'] = df.apply(lambda row: row['e_gk'] if row['a_gk'] > row['e_gk'] else row['e_gk'], axis=1)

Or with np.where

df['a_gk'] = np.where(df['a_gk'] > df['e_gk'], df['e_gk'], df['a_gk'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM