简体   繁体   English

搜索多个关键字python

[英]search multiple keywords python

How can I improve my code to search using a list of keywords in a specific column of a dataframe and return those rows that contains the value.如何改进我的代码以使用数据框特定列中的关键字列表进行搜索并返回包含该值的那些行。 the current code only accepts two keywords!当前代码只接受两个关键字!

contain_values = df[df['tweet'].str.contains('free','news')]
contain_values.head()

Series.str.contains takes a regular expression, per the documentation .根据文档Series.str.contains采用正则表达式。 Either construct a regular expression with your values or use a for-loop to check multiple elements one-by-one and then aggregate back.要么使用您的值构建正则表达式,要么使用 for 循环逐一检查多个元素,然后聚合回来。

Thus (for the regular expression):因此(对于正则表达式):

regex = '|'.join(['free', 'news'])
df['tweet'].str.contains(regex, case=False, na=False)

Note that you cannot pass a list directly to Series.str.contains , it'll raise an error.请注意,您不能将列表直接传递给Series.str.contains ,它会引发错误。 You also probably want to pass case=False and na=False to make the regular expressions case-insensitive and pass False if you have NaN somewhere in your tweet columns (like for a no-comment retweet).您可能还想通过case=Falsena=False使正则表达式不区分大小写,如果您的推文列中某处有NaN (例如无评论转发),则传递False

Your code currently only returns tweets that contain 'free' and ignores 'news' .您的代码目前仅返回包含'free'并忽略'news'推文。 Let's test it:让我们测试一下:

>>> df
          tweet
0    free stuff
1  newsnewsnews
2   hello world
3 another tweet
>>> df[df['tweet'].str.contains('free', 'news')]
        tweet
0  free stuff

See the documentation for .str.contains() : you can either pass a word, or a regular expression .请参阅.str.contains()文档:您可以传递单词或正则表达式 This will work:这将起作用:

df[df['tweet'].str.contains('free|news|hello')]

Here I've added a 3rd keyword, and now the first 3 elements of my dataframe are returned:在这里,我添加了第三个关键字,现在返回数据帧的前 3 个元素:

          tweet
0    free stuff
1  newsnewsnews
2   hello world

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM