简体   繁体   中英

Find items in one list but not in another in a pandas dataframe column

I keep running into dead ends here, and it's killing me.

Dataframe:

accountid    col2                 col3
1            ['abc','def','xyz']  ['abc','mda','xyz','sdi']
2            ['abc','asd','xyz','dib]  ['nio','ouy','abc']
3            ['abc','def','xyz']  ['abc','mda','xyz']

Notes

*each field in col2 and col3 are lists

*fields in col2 and col3 may not have an equal number of items in the list

Result should look like I'm trying to create a col4 that shows the items in col3 that are not in and col2:

accountid    col2                      col3                        col4
1            ['abc','def','xyz']       ['abc','mda','xyz','sdi']   ['mda','sdi']
2            ['abc','asd','xyz','dib]  ['nio','ouy','abc']         ['nio','ouy']
3            ['abc','def','xyz']       ['abc','mda','xyz']         ['mda']

Let me know if this doesn't make sense. I appreciate any help at all on this.

Let us do

s=df.col3.apply(set)-df.col2.apply(set)
0    {sdi, mda}
1    {nio, ouy}
2         {mda}
dtype: object
df['New']=s.map(list)

Check the result

s.map(list)
0    [sdi, mda]
1    [nio, ouy]
2         [mda]
dtype: object

You list is not list , it is string

import ast
df.iloc[:,1:]=df.iloc[:,1:].applymap(ast.literal_eval)

Try this. Apply the lambda function along the column axis=1

df['col4'] = df.apply(lambda x : list(set(x['col3']).difference(set(x['col2']))), axis=1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM