简体   繁体   English

如何从python中的数组检查特定关键字的推文的文本

[英]How to check the text of a tweet for a specific keyword from an array in python

Hi I am having an issue with searching for a specific piece of text within a tweet. 嗨,我在搜索推文中的特定文本时遇到问题。 I am currently using tweepy to stream tweets based on an array of keywords (called filterKeywords), however i want a specific function to be done depending on what keyword the tweet was filtered by. 我目前正在使用tweepy基于关键字数组(称为filterKeywords)流式处理tweet,但是我希望根据过滤tweet的关键字来完成特定功能。

I load the tweet into a JSON variable and try to use a for loop to cycle through the filterKeywords array in my on_data method, performing an IF statement to search if the current element on the filterKeywords array matches any text within the 'text' tag of the JSON tweet, however it doesnt seem to be filtering anything and seems to go to the else statement in my if statement immediately. 我将推文加载到JSON变量中,并尝试使用for循环在我的on_data方法中循环遍历filterKeywords数组,执行IF语句以搜索filterKeywords数组上的当前元素是否匹配的'text'标签中的任何文本。 JSON tweet,但是它似乎并未过滤任何内容,并且似乎立即进入了if语句中的else语句。 Here is my code below. 这是我的下面的代码。 Any help would be much appreciated. 任何帮助将非常感激。 Thanks 谢谢

import tweepy
import pymongo
import json

consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)

filterKeywords = ['IBM', 'Microsoft', 'Facebook', 'Twitter', 'Apple',        'Google', 'Amazon', 'EBay', 'Diageo',
              'General Motors', 'General Electric', 'Telefonica', 'Rolls Royce', 'Walmart', 'HSBC', 'BP',
              'Investec', 'WWE', 'Time Warner', 'Santander Group']


class CustomStreamListener(tweepy.StreamListener):
def __init__(self, api):
    self.api = api
    super(tweepy.StreamListener, self).__init__()
    try:
        global conn
        conn = pymongo.MongoClient('localhost', 27017)
        print "Connected successfully!!!"
        global db
        db = conn.mydb
    except pymongo.errors.ConnectionFailure, e:
        print "Could not connect to MongoDB: %s" % e
        conn


def on_data(self, data):
    datajson = json.loads(data)
    for word in filterKeywords:
       if word in datajson['text']:
        collection = db[word]
        collection.insert(datajson)
        print('Tweet found filtered by ' + word)
    else:
        print('')



def on_error(self, status_code):
    return True  # Don't kill the stream

def on_timeout(self):
    return True  # Don't kill the stream


sapi = tweepy.streaming.Stream(auth, CustomStreamListener(api))

sapi.filter(track=filterKeywords) sapi.filter(track = filterKeywords)

I think your problem is that you included "Twitter" in the filter keywords, and that matches almost everything (not only the text are used for filtering, but some other fields as well). 我认为您的问题是您在过滤器关键字中包含了“ Twitter”,并且几乎匹配了所有内容(不仅文本用于过滤,还包括其他一些字段)。 Try removing it from the filter keywords. 尝试将其从filter关键字中删除。

def on_data(self, data):
    datajson = json.loads(data)
    if any([i for i in filterKeywords if i in datajson["text"]]):
        """Do Desired function"""
    else:
        print('if statement not working')

Simple mistake on your program, even after if condition works it may enter else in the next iteration. 在你的程序简单的错误,即使if条件的作品,可进入else的下一次迭代。

From your comments If you wish to avoid keyError 'test' .Rewrite your function like 来自您的评论如果您希望避免keyError 'test'请像这样重写您的函数

def on_data(self, data):
datajson = json.loads(data)
for word in filterKeywords:
    if datajson.get('text') and word in datajson['text']:
        collection = db[word]
        collection.insert(datajson)
        print('Tweet found filtered by ' + word)
else:
    print('')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用python从用户时间线twitter上通过推特上的特定文本检索/查找 - How to retrieve/find with specific text on tweet from user timeline twitter with python 如何在python中的特定关键字之前提取文本? - How to extract text before a specific keyword in python? Python selenium 如何检查特定网页元素是否包含特定关键字? - Python selenium how to check if a specific web element contains a specific keyword? 从JSON tweet元素中提取“文本”字段并将其添加到字符串数组python - Extracting the 'text' field from a JSON tweet element and adding it to string array python 如何使用Twitter API从python中的特定配置文件打印推文 - How to print tweet from from specific profile in python using twitter api 如何从Tweepy获取一条推文的全文 - How to get full text of a tweet from Tweepy 如何在文件中从and到关键字中搜索特定词并在python中打印句子 - how to search for a specific from and to keyword in a file and print the sentence in python 如何使用 python 从特定关键字中提取有限的数据行 - How to extract limited lines of data from specific keyword using python 如何从推文中删除@user、主题标签和链接,并将其放入 python 中的 dataframe - How to remove @user, hashtag, and links from tweet text and put it into dataframe in python 根据Python中的推文文本对推文类型(推文/转推/提及)进行分类 - Classify type of tweet (tweet/retweet/mention) based on tweet text in Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM