Counting a list of words in a list of strings using python

Question

So I have a pandas dataframe with rows of tokenized strings in a column named story. I also have a list of words in a list called selected_words. I am trying to count the instances of any of the selected_words in each of the rows in the column story.

The code I used before that had worked is

CCwordsCount=df4.story.str.count('|'.join(selected_words))

This is now giving me NaN values for every row.

Below is the first few rows of the column story in df4. The dataframe contains a little over 400 rows of NYTimes Articles.

0      [it, was, a, curious, choice, for, the, good, ...
1      [when, he, was, a, yale, law, school, student,...
2      [video, bitcoin, has, real, world, investors, ...
3      [bitcoin, s, wild, ride, may, not, have, been,...
4      [amid, the, incense, cheap, art, and, herbal, ...
5      [san, francisco, eight, years, ago, ernie, all...

This is the list of selected_words

selected_words = ['accept', 'believe', 'trust', 'accepted', 'accepts', 'trusts', 'believes', \
                  'acceptance', 'trusted', 'trusting', 'accepting', 'believes', 'believing', 'believed',\
                 'normal', 'normalize', ' normalized', 'routine', 'belief', 'faith', 'confidence', 'adoption', \
                  'adopt', 'adopted', 'embrace', 'approve', 'approval', 'approved', 'approves']

Link to my df4.csv file

Answer 1

.find() function can be useful. And this can be implemented in many different ways. If you don't have any other purpose for the raw article and it can be a bunch of string. Then try this, you can also put them in a dictionary and loop over.

def find_words(text, words):
    return [word for word in words if word in text]

sentences = "0  [it, was, a, curious, choice, for, the, good, 1      [when, he, was, a, yale, law, school, student, 2      [video, bitcoin, has, real, world, investors, 3      [bitcoin, s, wild, ride, may, not, have, been, 4      [amid, the, incense, cheap, art, and, herbal, 5      [san, francisco, eight, years, ago, ernie, all"

search_keywords=['accept', 'believe', 'trust', 'accepted', 'accepts', 'trusts', 'believes', \
                  'acceptance', 'trusted', 'trusting', 'accepting', 'believes', 'believing', 'believed',\
                 'normal', 'normalize', ' normalized', 'routine', 'belief', 'faith', 'confidence', 'adoption', \
                  'adopt', 'adopted', 'embrace', 'approve', 'approval', 'approved', 'approves', 'good']

found = find_words(sentences, search_keywords)

print(found)

Note: I didn't have panda data frame in mind whine I create this.

Answer 2

Each story entry appears to be a list containing a string.

Use map to get the string from the list before applying str as follows.

CCwordsCount = df4.story.map(lambda x: ''.join(x[1:-1])).str.count('|'.join(selected_words))

print(CCwordsCount.head(20))   # Show first 20 story results

Output

0      1
1      2
2      5
3      7
4      0
5      1
6     10
7      8
8      2
9      2
10     8
11     0
12     0
13     2
14     0
15     4
16     2
17     9
18     0
19     0
Name: story, dtype: int64

Explanation

Each story was in a list converted to a string, so basically:

"['it', 'was', 'a', 'curious', 'choice', 'for', 'the', 'good', 'wife', ...]"

Converted to list of words by dropping '[' and ']' and concatenating words

map(lambda x: ''.join(x[1:-1]))

This results in words separated by commas in quotes. For first row this results in the string:

'it', 'was', 'a', 'curious', 'choice', 'for', ...

Counting a list of words in a list of strings using python

Question

2 answers

solution1
0 2020-05-13 15:16:44

solution2
0 ACCPTED 2020-05-13 15:43:21

Counting a list of words in a list of strings using python

Question

2 answers

solution1 0 2020-05-13 15:16:44

solution2 0 ACCPTED 2020-05-13 15:43:21

solution1
0 2020-05-13 15:16:44

solution2
0 ACCPTED 2020-05-13 15:43:21