[英]How to check the text of a tweet for a specific keyword from an array in python
Hi I am having an issue with searching for a specific piece of text within a tweet. 嗨,我在搜索推文中的特定文本时遇到问题。 I am currently using tweepy to stream tweets based on an array of keywords (called filterKeywords), however i want a specific function to be done depending on what keyword the tweet was filtered by.
我目前正在使用tweepy基于关键字数组(称为filterKeywords)流式处理tweet,但是我希望根据过滤tweet的关键字来完成特定功能。
I load the tweet into a JSON variable and try to use a for loop to cycle through the filterKeywords array in my on_data method, performing an IF statement to search if the current element on the filterKeywords array matches any text within the 'text' tag of the JSON tweet, however it doesnt seem to be filtering anything and seems to go to the else statement in my if statement immediately. 我将推文加载到JSON变量中,并尝试使用for循环在我的on_data方法中循环遍历filterKeywords数组,执行IF语句以搜索filterKeywords数组上的当前元素是否匹配的'text'标签中的任何文本。 JSON tweet,但是它似乎并未过滤任何内容,并且似乎立即进入了if语句中的else语句。 Here is my code below.
这是我的下面的代码。 Any help would be much appreciated.
任何帮助将非常感激。 Thanks
谢谢
import tweepy
import pymongo
import json
consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
filterKeywords = ['IBM', 'Microsoft', 'Facebook', 'Twitter', 'Apple', 'Google', 'Amazon', 'EBay', 'Diageo',
'General Motors', 'General Electric', 'Telefonica', 'Rolls Royce', 'Walmart', 'HSBC', 'BP',
'Investec', 'WWE', 'Time Warner', 'Santander Group']
class CustomStreamListener(tweepy.StreamListener):
def __init__(self, api):
self.api = api
super(tweepy.StreamListener, self).__init__()
try:
global conn
conn = pymongo.MongoClient('localhost', 27017)
print "Connected successfully!!!"
global db
db = conn.mydb
except pymongo.errors.ConnectionFailure, e:
print "Could not connect to MongoDB: %s" % e
conn
def on_data(self, data):
datajson = json.loads(data)
for word in filterKeywords:
if word in datajson['text']:
collection = db[word]
collection.insert(datajson)
print('Tweet found filtered by ' + word)
else:
print('')
def on_error(self, status_code):
return True # Don't kill the stream
def on_timeout(self):
return True # Don't kill the stream
sapi = tweepy.streaming.Stream(auth, CustomStreamListener(api))
sapi.filter(track=filterKeywords) sapi.filter(track = filterKeywords)
I think your problem is that you included "Twitter" in the filter keywords, and that matches almost everything (not only the text are used for filtering, but some other fields as well). 我认为您的问题是您在过滤器关键字中包含了“ Twitter”,并且几乎匹配了所有内容(不仅文本用于过滤,还包括其他一些字段)。 Try removing it from the filter keywords.
尝试将其从filter关键字中删除。
def on_data(self, data):
datajson = json.loads(data)
if any([i for i in filterKeywords if i in datajson["text"]]):
"""Do Desired function"""
else:
print('if statement not working')
Simple mistake on your program, even after if
condition works it may enter else
in the next iteration. 在你的程序简单的错误,即使
if
条件的作品,可进入else
的下一次迭代。
From your comments If you wish to avoid keyError 'test'
.Rewrite your function like 来自您的评论如果您希望避免
keyError 'test'
请像这样重写您的函数
def on_data(self, data):
datajson = json.loads(data)
for word in filterKeywords:
if datajson.get('text') and word in datajson['text']:
collection = db[word]
collection.insert(datajson)
print('Tweet found filtered by ' + word)
else:
print('')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.