Check if each value in a dataframe column contains words from another dataframe column

Question

How do I iterate through each value in one dataframe column and check if it contains words in another dataframe column?

a = pd.DataFrame({'text': ['the cat jumped over the hat', 'the pope pulled on the rope', 'i lost my dog in the fog']})
b = pd.DataFrame({'dirty_words': ['cat', 'dog', 'parakeet']})

a    
    text
0   the cat jumped over the hat
1   the pope pulled on the rope
2   i lost my dog in the fog

b
    dirty_words
0   cat
1   dog
2   parakeet

I want to get a new dataframe that contains only these values:

result

0   the cat jumped over the hat
1   i lost my dog in the fog

Answer 1

You can use a list comprehension with any after splitting strings by whitespace. This method won't include "catheter" just because it includes "cat".

mask = [any(i in words for i in b['dirty_words'].values) \
        for words in a['text'].str.split().values]

print(a[mask])

                          text
0  the cat jumped over the hat
2     i lost my dog in the fog

Answer 2

I think you can using isin after str.split

a[pd.DataFrame(a.text.str.split().tolist()).isin(b.dirty_words.tolist()).any(1)]
Out[380]: 
                          text
0  the cat jumped over the hat
2     i lost my dog in the fog

Answer 3

Use regex matching with str.contains .

p = '|'.join(b['dirty_words'].dropna())
a[a['text'].str.contains(r'\b{}\b'.format(p))]

                          text
0  the cat jumped over the hat
2     i lost my dog in the fog

The word boundaries ensure you won't match "catch" just because it contains "cat" (thanks @DSM).

Check if each value in a dataframe column contains words from another dataframe column

Question

3 answers

solution1
3 ACCPTED 2018-07-16 00:23:42

solution2
3 2018-07-16 01:05:54

solution3
2 2018-07-16 00:06:07

Check if each value in a dataframe column contains words from another dataframe column

Question

3 answers

solution1 3 ACCPTED 2018-07-16 00:23:42

solution2 3 2018-07-16 01:05:54

solution3 2 2018-07-16 00:06:07

solution1
3 ACCPTED 2018-07-16 00:23:42

solution2
3 2018-07-16 01:05:54

solution3
2 2018-07-16 00:06:07