简体   繁体   中英

How to compare two columns both with list of strings and create a new column with unique items?

I have two columns both with list of strings. Basically one column df['products'] which are in all capitals. The other column is product description df['desc'] .

I want to check what all items in df['products'] are present in df['desc'] and make a new column out of it.

I tried the following code:

df['uniq'] = df.apply(lambda x : [i for i in x['products'] if i.lower() in x['desc']])

I checked the other similar questions and built the above code, but it's not working.

The data looks something like this:

在此处输入图片说明

Don't use apply() when you don't absolutely need to. It's slow.

Instead, do it the vectorized way:

desc_upper = df.desc.str.upper()
matches = df.products.isin(desc_upper)
result = df.products[matches]

It seems you need add axis=1 if need check per rows:

df = pd.DataFrame({'products':[['A','B'],['D','C']],
                   'desc':[['a', 'c'],['c', 'e']]})

df['uniq'] = df.apply(lambda x: [i for i in x['products'] if i.lower() in x['desc']], axis=1)
print (df)
     desc products uniq
0  [a, c]   [A, B]  [A]
1  [c, e]   [D, C]  [C]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM