简体   繁体   中英

using df.apply to call function associate with different colum

Given a pd.DataFrame like:

    to_remove        pred_0         ....  pred_10
0   ['apple']       ['apple','abc'] ....  ['apple','orange']    
1   ['cd','sister'] ['uncle','cd']  ....  ['apple']

On each row, I want to remove the element in pred_0 ... pred_10 if this element show up in to_remove in the same row.

In this example, the answer should be:

    to_remove        pred_0 ....  pred_10
0   ['apple']        ['abc']....  ['orange']    # remove 'apple' this row
1   ['cd','sister']  ['uncle']....['apple']     # remove 'cd' and 'sister' this row

I am wondering how to associate the code to do so.

To generate the example df:

from collections import OrderedDict
D=pd.DataFrame(OrderedDict({'to_remove':[['apple'],['cd','sister']],'pred_0':[['apple','abc'],['uncle','cd']],'pred_1':[['apple','orange'],['apple']]}))

You can try of iterating the each row by row and filter the elements which are not specified in that column

Considered dataframe

        pred_0      pred_10       to_remove
0   [apple, abc]    [apple, orage]  [apple]
1   [uncle, cd]      [apple]        [cd, sister]

df.apply(lambda x: x[x.index.difference(['to_remove'])].apply(lambda y: [i for i in y if i not in x['to_remove']]),1)

Out:

    pred_0  pred_10
0   [abc]   [orage]
1   [uncle] [apple]

You can use a couple of list comprehensions:

s = df['to_remove'].map(set)

for col in ['pred_0', 'pred_1']:
    df[col] = [[i for i in L if i not in S] for L, S in zip(df[col], s)]

print(df)

      to_remove   pred_0    pred_1
0       [apple]    [abc]  [orange]
1  [cd, sister]  [uncle]   [apple]

List comprehensions will likely be more efficient than pd.DataFrame.apply , which has the expensive of constructing and passing a series to a function for each row. As you can see, there's no real leveraging of Pandas / NumPy for your requirement.

As such, unless you can afford to expand your lists into series of strings, dict + list may be a more appropriate choice of data structure.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM