简体   繁体   中英

Python : compare two list of strings in a Pandas dataframe

I would like to check if each word in the labels list exist in each list in the column 'bigrams'.

And if one these words exist in the bigram list, I would like to replace the label none by the word that exists.

I tried to write two consecutive for loop but it doesn't work. I also tried a comprehension list.

How can I do?

在此处输入图像描述

You can use pd.Series.str.extract

df = pd.DataFrame({'bgrams': [['hello','goodbye'],['dog','cat'],['cow']], 'label':[None,None,None]})
df
#             bgrams label
#0  [hello, goodbye]  None
#1        [dog, cat]  None
#2             [cow]  None

labels=['cat','goodbye']

regex='('+'|'.join(labels)+')'

df['label']=df.bgrams.astype(str).str.extract(regex)

Output:

df
             bgrams    label
0  [hello, goodbye]  goodbye
1        [dog, cat]      cat
2             [cow]      NaN

For multiple matches, you can use pd.Series.str.findall :

df = pd.DataFrame({'bgrams': [['hello','goodbye','cat'],['dog','cat'],['cow']], 'label':[None,None,None]})
df
#             bgrams label
#0  [hello, goodbye, cat]  None
#1        [dog, cat]  None
#2             [cow]  None

labels=['cat','goodbye']

regex='('+'|'.join(labels)+')'

df['label']=df.bgrams.astype(str).str.findall(regex)

Output:

df
                  bgrams           label
0  [hello, goodbye, cat]  [goodbye, cat]
1             [dog, cat]           [cat]
2                  [cow]              []

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM