Following on from this question , I have this dataframe:
ChildID MotherID preWeight
0 20 455 3500
1 20 455 4040
2 13 102 NaN
3 702 946 5000
4 82 571 2000
5 82 571 3500
6 82 571 3800
where I transformed feature 'preWeight' that has multiple observations per MotherID to feature 'preMacro' with a single observation per MotherID, based on the following rules:
Using this line of code:
df.groupby(['ChildID','MotherID']).agg(lambda x: 'Yes' if (x>4000).any() else 'No').reset_index().rename(columns={"preWeight": "preMacro"})
However, I realised that this way I am not preserving the NaN values in the dataset, which ideally should be imputed rather than just assigning them "No" values. So I tried changing the above line to:
df=df.groupby(['MotherID', 'ChildID'])['preWeight'].agg(
lambda x: 'Yes' if (x>4000).any() else (np.NaN if 'no_value' in x.values.all() else 'No')).reset_index().rename(
columns={"preWeight": "preMacro"})
I wanted this line to transform the above dataframe to this:
ChildID MotherID preMacro
0 20 455 Yes
1 13 102 NaN
2 702 946 Yes
3 82 571 No
However I got this error when running it:
TypeError: argument of type 'float' is not iterable
I understand that, in the case of non-missing values, the values of x.values.all() are float numbers, which are not iterable, but I am not sure how else to code this, any ideas?
Thanks.
For performance dont test in custom function per groups, better is aggregate by GroupBy.agg
by helper column for boolean mask with GroupBy.all
and GroupBy.any
and then set column preMacro
by numpy.select
:
df = (df.assign(testconst = df['preWeight'] > 4000,
testna = df['preWeight'].notna())
.groupby(['ChildID','MotherID'], sort=False)
.agg({'testconst':'any', 'testna':'all'}))
masks = [df['testconst'] & df['testna'], df['testconst'] | df['testna']]
df['preMacro'] = np.select(masks, ['Yes','No'], default=None)
df = df.drop(['testconst','testna'], axis=1).reset_index()
print (df)
ChildID MotherID preMacro
0 20 455 Yes
1 13 102 None <- for avoid convert np.NaN to string nan is used None
2 702 946 Yes
3 82 571 No
If small DataFrame or performance is not important:
f = lambda x: 'Yes' if (x>4000).any() else ('No' if x.notna().all() else np.NaN)
df1 = (df.groupby(['ChildID','MotherID'], sort=False)['preWeight']
.agg(f)
.reset_index(name='preMacro'))
print (df1)
ChildID MotherID preMacro
0 20 455 Yes
1 13 102 NaN
2 702 946 Yes
3 82 571 No
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.