简体   繁体   中英

How to chain apply functions to subset a pandas dataframe

I have a pandas dataframe that I would like to subset based on the application of two functions is_long() and is_short() . The first must return True, and the second must return True in order for the row to be subset. For exapmle:

import pandas as pd

data = [['foo', 10], ['baar', 15], ['baz', 14], ['baaar', 15]]
df = pd.DataFrame(data, columns = ['name', 'age'])
df

def is_long(x):
    assert isinstance(x, str)
    return True if len(x) > 2 else False


def is_short(x):
    assert isinstance(x, str)
    return True if len(x) < 4 else False

The following should return rows of name that have a length of 3:

df[df['name'].apply(is_long).apply(is_short)]

should return:

    name    age
0   foo     10
2   baz     14

but the second apply is not performed on a dataframe as it returns an assertion error:

   11 
     12 def is_short(x):
---> 13     assert isinstance(x, str)
     14     return True if len(x) < 4 else False

AssertionError: 

My question is - how does one elegantly chain together two apply functions (without writing two separate lines of code) so that they are acting on the same column and are executed in sequence?

Any advice here would be appreciated.

If you want to do it by apply() method then

instead of:

df[df['name'].apply(is_long).apply(is_short)]

Do/Use this:

 df[df.loc[df['name'].apply(is_long),'name'].apply(is_short)]

OR

df[df['name'].apply(lambda x: is_long(x) & is_short(x))]

output of above method:

#output

    name    age
0   foo     10
2   baz     14

Explaination:

In your code: df[df['name'].apply(is_long).apply(is_short)]

df['name'].apply(is_long) gives a boolean series of True and you are passing that boolean series to is_short() function by chaining another apply() method to it and that's why you are getting AssertionError because your used assert keyword in your is_short() function and your condition isinstance(x, str) is not satisfying

You can wrap those two functions in a function in the lambda sense ie:

df[df.name.apply(lambda x: is_long(x) and is_short(x))]

to get

  name  age
0  foo   10
2  baz   14

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM