简体   繁体   中英

TypeError: argument of type 'float' is not iterable when using lambda function in dataframe

Following on from this question , I have this dataframe:

    ChildID   MotherID   preWeight
0     20      455        3500
1     20      455        4040
2     13      102        NaN
3     702     946        5000
4     82      571        2000
5     82      571        3500
6     82      571        3800

where I transformed feature 'preWeight' that has multiple observations per MotherID to feature 'preMacro' with a single observation per MotherID, based on the following rules:

  • if preWeight>=4000 for a particular MotherID, I assigned preMacro a value of "Yes" regardless of the remaining observations
  • Otherwise I assigned preMacro a value of "No"

Using this line of code:

    df.groupby(['ChildID','MotherID']).agg(lambda x: 'Yes' if (x>4000).any() else 'No').reset_index().rename(columns={"preWeight": "preMacro"})

However, I realised that this way I am not preserving the NaN values in the dataset, which ideally should be imputed rather than just assigning them "No" values. So I tried changing the above line to:

    df=df.groupby(['MotherID', 'ChildID'])['preWeight'].agg(
        lambda x: 'Yes' if (x>4000).any() else (np.NaN if 'no_value' in x.values.all() else 'No')).reset_index().rename(
        columns={"preWeight": "preMacro"})

I wanted this line to transform the above dataframe to this:

    ChildID   MotherID   preMacro
0   20        455        Yes
1   13        102        NaN
2   702       946        Yes
3   82        571        No

However I got this error when running it:

TypeError: argument of type 'float' is not iterable

I understand that, in the case of non-missing values, the values of x.values.all() are float numbers, which are not iterable, but I am not sure how else to code this, any ideas?

Thanks.

For performance dont test in custom function per groups, better is aggregate by GroupBy.agg by helper column for boolean mask with GroupBy.all and GroupBy.any and then set column preMacro by numpy.select :

df = (df.assign(testconst = df['preWeight'] > 4000,
                testna = df['preWeight'].notna())
        .groupby(['ChildID','MotherID'], sort=False)
        .agg({'testconst':'any', 'testna':'all'}))


masks = [df['testconst'] & df['testna'], df['testconst'] | df['testna']]
df['preMacro'] = np.select(masks, ['Yes','No'], default=None)
df = df.drop(['testconst','testna'], axis=1).reset_index()
print (df)
   ChildID  MotherID preMacro
0       20       455      Yes
1       13       102     None <- for avoid convert np.NaN to string nan is used None
2      702       946      Yes
3       82       571       No

If small DataFrame or performance is not important:

f = lambda x: 'Yes' if (x>4000).any() else ('No' if x.notna().all() else np.NaN)
df1 = (df.groupby(['ChildID','MotherID'], sort=False)['preWeight']
         .agg(f)
         .reset_index(name='preMacro'))
print (df1)
   ChildID  MotherID preMacro
0       20       455      Yes
1       13       102      NaN
2      702       946      Yes
3       82       571       No

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM