Existing Dataframe and Desired Result either Pandas or NumPy: contactid, bonustype, bonusreceived, NEW_COLUMN
contactid bonustype bonusreceived NEW_COLUMN
100 a yes ab
100 b no NULL
200 a no NULL
200 b yes abc
200 c yes abc
I have to check from bonustype if both values (a,b) are true and bonusreceived is 'yes' for contactid then return (ab) in NEW_COLUMN. If all three bonustype (a, b, c) and bonusreceived is 'yes' then return (abc) in NEW_COLUMN.
I have tried several tricks but not able to get the above desired result. Any help will highly be appreciated.
Thanks
With the clarified requirement that
contactid
, each bonustype
should be used only once in the aggregated text in NEW_COLUMN
bonusreceived
== 'no', the corresponding NEW_COLUMN
should be NULL
We can use .groupby()
+ transform()
and join the unique text of bonustype
. Then, use np.where()
to ensure only when bonusreceived
== 'yes' we get the aggregated text and set NaN
otherwise.
import numpy as np
df['NEW_COLUMN'] = np.where(df['bonusreceived'] == 'yes',
df.groupby('contactid')['bonustype'].transform(lambda x: ''.join(x.unique())),
np.nan)
Data Input
print(df)
contactid bonustype bonusreceived
0 100 a yes
1 100 b no
2 200 a no
3 200 b yes
4 200 c yes
5 100 a no
6 200 a yes
Result:
print(df)
contactid bonustype bonusreceived NEW_COLUMN
0 100 a yes ab
1 100 b no NaN
2 200 a no NaN
3 200 b yes abc
4 200 c yes abc
5 100 a no NaN
6 200 a yes abc
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.