I am working with a pandas dataframe. In this dataframe I have two columns one is enrollment (e_gk) and the other one is attendence (a_gk), there are some errors in data where attendence is high than actual enrollment. I want to replace the values of attendence with actual enrollment in such errors.
My main code line for this condition. Here in iterations 'e' is for enrollment and 'a' for attendence.
df['a_gk'] = [e if a > e else a for a, e in df.a_gk and df.e_gk]
this gives me the following error:
"ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()"
The problem is and
operator doesn't support Series operation in df.a_gk and df.e_gk
. You may want zip
two columns together.
df['a_gk'] = [e if a > e else a for a, e in zip(df.a_gk, df.e_gk)]
But you could also use apply
on rows.
df['a_gk'] = df.apply(lambda row: row['e_gk'] if row['a_gk'] > row['e_gk'] else row['e_gk'], axis=1)
Or with np.where
df['a_gk'] = np.where(df['a_gk'] > df['e_gk'], df['e_gk'], df['a_gk'])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.