简体   繁体   中英

search multiple keywords python

How can I improve my code to search using a list of keywords in a specific column of a dataframe and return those rows that contains the value. the current code only accepts two keywords!

contain_values = df[df['tweet'].str.contains('free','news')]
contain_values.head()

Series.str.contains takes a regular expression, per the documentation . Either construct a regular expression with your values or use a for-loop to check multiple elements one-by-one and then aggregate back.

Thus (for the regular expression):

regex = '|'.join(['free', 'news'])
df['tweet'].str.contains(regex, case=False, na=False)

Note that you cannot pass a list directly to Series.str.contains , it'll raise an error. You also probably want to pass case=False and na=False to make the regular expressions case-insensitive and pass False if you have NaN somewhere in your tweet columns (like for a no-comment retweet).

Your code currently only returns tweets that contain 'free' and ignores 'news' . Let's test it:

>>> df
          tweet
0    free stuff
1  newsnewsnews
2   hello world
3 another tweet
>>> df[df['tweet'].str.contains('free', 'news')]
        tweet
0  free stuff

See the documentation for .str.contains() : you can either pass a word, or a regular expression . This will work:

df[df['tweet'].str.contains('free|news|hello')]

Here I've added a 3rd keyword, and now the first 3 elements of my dataframe are returned:

          tweet
0    free stuff
1  newsnewsnews
2   hello world

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM