简体   繁体   English

Python:如何搜索推文并存储在数据库中?

[英]Python: How to search tweets and store in database?

I've got a nice Python script that currently prints out the past 200 tweets from a given username. 我有一个不错的Python脚本,当前可以从给定的用户名中打印出过去200条推文。

However, I'd like to modify it so that instead it will collect the past 200 tweets that include a certain hashtag (from any username) and then I'd like to store those results in a database. 但是,我想对其进行修改,以便改为收集过去的200条推文,这些推文包括某个标签(来自任何用户名),然后将这些结果存储在数据库中。

Can anyone provide a suggestion on how to modify the code below? 谁能提供有关如何修改以下代码的建议?

import sys
import operator
import requests
import json
import twitter

twitter_consumer_key = 'XXXX'
twitter_consumer_secret = 'XXXX'
twitter_access_token = 'XXXX'
twitter_access_secret = 'XXXX'

twitter_api = twitter.Api(consumer_key=twitter_consumer_key, consumer_secret=twitter_consumer_secret, access_token_key=twitter_access_token, access_token_secret=twitter_access_secret)

statuses = twitter_api.GetUserTimeline(screen_name=handle, count=200, include_rts=False)

for status in statuses:
  if (status.lang == 'en'):
    print status

Not familiar with the twitter package but this could be a suggestion that you can work on. 不熟悉twitter包,但这可能是您可以继续使用的建议。 Depends on how you want to save the tweet, you can replace the "print status" with the way you want. 根据您要保存推文的方式,可以用所需的方式替换“打印状态”。 However, this only allows you to filter the 200 tweets rather than get the 200 tweets that contain certain hashtag. 但是,这仅允许您过滤200条tweets,而不是获取包含某些主题标签的200条tweets。

import sys
import operator
import requests
import json
import twitter

twitter_consumer_key = 'XXXX'
twitter_consumer_secret = 'XXXX'
twitter_access_token = 'XXXX'
twitter_access_secret = 'XXXX'

twitter_api = twitter.Api(consumer_key=twitter_consumer_key, consumer_secret=twitter_consumer_secret, access_token_key=twitter_access_token, access_token_secret=twitter_access_secret)

statuses = twitter_api.GetUserTimeline(screen_name=handle, count=200, include_rts=False)

tag_list = ["Xmas", "Summer"]
for status in statuses:
  if (status.lang == 'en'):
    #assume there exists a hashtag in the tweet
    for hashtag in status.entities.hashtags:
      if hashtag.text in tag_list:
        print status

I am attaching a java code that will print out past 100 tweets including '#engineeringproblems' hashtag (from any user). 我附加了一个Java代码,该代码将打印出100条以上的tweet,包括“ #engineeringproblems”主题标签(来自任何用户)。 You need to add twitter API 'twitter4J' in the library. 您需要在库中添加twitter API'twitter4J'。

API download link- http://twitter4j.org/en/index.html#download API下载链接-http: //twitter4j.org/en/index.html#download

Java source code: Java源代码:

public static void main(String[] args) {

    ConfigurationBuilder cb = new ConfigurationBuilder();
    cb.setDebugEnabled(true)
     .setOAuthConsumerKey("xxxx")
     .setOAuthConsumerSecret("xxxx")
     .setOAuthAccessToken("xxxx")
     .setOAuthAccessTokenSecret("xxxx");

    Twitter twitter = new TwitterFactory(cb.build()).getInstance();
    Query query = new Query("#engineeringproblems ");
    int numberOfTweets = 100;
    long lastID = Long.MAX_VALUE;
    ArrayList<Status> tweets = new ArrayList<Status>();

    while (tweets.size() < numberOfTweets) {
        if (numberOfTweets - tweets.size() > 100) {
            query.setCount(100);
        } else {
            query.setCount(numberOfTweets - tweets.size());
        }
        try {
            QueryResult result = twitter.search(query);
            tweets.addAll(result.getTweets());
            System.out.println("Gathered " + tweets.size() + " tweets" + "\n");
            for (Status t : tweets) {
                if (t.getId() < lastID) {
                    lastID = t.getId();
                }
            }

        } catch (TwitterException te) {
            System.out.println("Couldn't connect: " + te);
        };
        query.setMaxId(lastID - 1);
    }
    for (int i = 0; i < tweets.size(); i++) {
        Status t = (Status) tweets.get(i);


        String user = t.getUser().getScreenName();
        String msg = t.getText();

        System.out.println(i + " USER: " + user + " wrote: " + msg + "\n");
    }
}

Sorry, but I've really been looking for a Python solution and I believe I've finally found it and tested it successfully. 抱歉,但是我确实一直在寻找Python解决方案,我相信我终于找到并成功测试了它。 Code is below. 代码如下。 Still looking for a way to modify the script to enter each line into a SQL database, but I hopefully I can find that elsewhere. 仍在寻找一种修改脚本的方法,以将每一行输入到SQL数据库中,但我希望可以在其他地方找到它。

pip install TwitterSearch 点安装Twitter搜索

from TwitterSearch import *
try:
    tso = TwitterSearchOrder() # create a TwitterSearchOrder object
    tso.set_keywords(['Guttenberg', 'Doktorarbeit']) # let's define all words we would like to have a look for
    tso.set_language('de') # we want to see German tweets only
    tso.set_include_entities(False) # and don't give us all those entity information

    # it's about time to create a TwitterSearch object with our secret tokens
    ts = TwitterSearch(
        consumer_key = 'aaabbb',
        consumer_secret = 'cccddd',
        access_token = '111222',
        access_token_secret = '333444'
     )

     # this is where the fun actually starts :)
    for tweet in ts.search_tweets_iterable(tso):
        print( '@%s tweeted: %s' % ( tweet['user']['screen_name'], tweet['text'] ) )

except TwitterSearchException as e: # take care of all those ugly errors if there are some
    print(e)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM