简体   繁体   English

无法获得Tweet的国家 - Twython API

[英]Not able to get Country of a Tweet - Twython API

I am using the following code to collect Tweets pertaining to a certain topic but in all the tweets that I have extracted the 'places' attribute is None. 我使用以下代码收集与特定主题相关的推文,但在我提取的所有推文中,'places'属性为None。 Am I doing something wrong? 难道我做错了什么? Also, the code is meant to extract existing tweets and I do not need streaming api solution and not looking for this solution of streaming API : https://www.quora.com/How-can-I-get-a-stream-of-tweets-from-a-particular-country-using-Twitter-API 此外,该代码旨在提取现有的推文,我不需要流API解决方案,也不需要寻找这种流API的解决方案: https//www.quora.com/How-can-I-get-a-stream-的鸣叫-从-A-特定国-使用-Twitter的API

api =   Twython(consumer_key, consumer_secret, access_key, access_secret)

tweets                          =   []
MAX_ATTEMPTS                    =   200
COUNT_OF_TWEETS_TO_BE_FETCHED   =   10000
in_max_id = sys.argv[1]
next_max_id = ''
for i in range(0,MAX_ATTEMPTS):

    if(COUNT_OF_TWEETS_TO_BE_FETCHED < len(tweets)):
        break # we got 500 tweets... !!

    #----------------------------------------------------------------#
    # STEP 1: Query Twitter
    # STEP 2: Save the returned tweets
    # STEP 3: Get the next max_id
    #----------------------------------------------------------------#

    # STEP 1: Query Twitter
    if(0 == i):
        # Query twitter for data. 
        results    = api.search(q="#something",count='100',lang='en',max_id=in_max_id,include_entities='true',geo= True)
    else:
        # After the first call we should have max_id from result of previous call. Pass it in query.
        results    = api.search(q="#something",include_entities='true',max_id=next_max_id,lang='en',geo= True)

    # STEP 2: Save the returned tweets
    for result in results['statuses']:

        temp = ""
        tweet_text = result['text']
        temp += tweet_text.encode('utf-8') + " "
        hashtags = result['entities']['hashtags']
        for i in hashtags:
            temp += i['text'].encode('utf-8') + " " 
        print result
        #temp += i["place"]["country"] + "\n"
        #output_file.write(temp)




    # STEP 3: Get the next max_id
    try:
        # Parse the data returned to get max_id to be passed in consequent call.
        next_results_url_params    = results['search_metadata']['next_results']
        next_max_id        = next_results_url_params.split('max_id=')[1].split('&')[0]
    except:
        # No more next pages
        break

If place field is a MUST for all the tweet that you app will process, then you can limit your search over a place to make sure all the result will definitely have it. 如果place字段是您应用程序将处理的所有推文的必须,那么您可以限制搜索某个地方以确保所有结果肯定会有它。

You can doing so by setting geocode (latitude,longitude,radius[km/mi]) parameter, to limit your search within an area. 您可以通过设置geocode (纬度,经度,半径[km / mi])参数来限制您在区域内的搜索。

An example such request via Twython is: 通过Twython这样的请求的示例是:

geocode = '25.032341,55.385557,100mi'
api.search(q="#something",count='100',lang='en',include_entities='true',geocode=geocode)

The short answer is, No, you are doing nothing wrong. 简短的回答是,不,你没有做错任何事。 The reason why all place tags are empty is because statistically they are very unlikely to contain data. 所有place标记都为空的原因是因为统计上它们不太可能包含数据。 Only about 1% of all tweets have data in their place tag. 只有约1%的推文在其place标记中包含数据。 This is because users rarely tweet their location. 这是因为用户很少发布他们的位置。 Location is off by default. 默认情况下,位置已关闭。

Download 100 or more tweets and you probably will find place tag data. 下载100条或更多推文,你可能会找到place标签数据。

Not all tweets have all fields like tweet_text, place, country, language etc., 并非所有推文都包含tweet_text,地点,国家,语言等所有字段,

So, to avoid KeyError use the following approach. 因此,要避免KeyError使用以下方法。 Modify your code so that when the key that you're looking for is not found, a default value is returned. 修改您的代码,以便在找不到您要查找的key ,返回默认值。

result.get('place', {}).get('country', {}) if result.get('place') != None else None

Here, the above line means "search for the key country after fetching the key place if it exists, otherwise return None " 这里,上面的行表示“在获取密钥place后搜索密钥country如果存在,否则返回None

kmario is right. kmario是对的。 Most tweets don't have this information, but a small percent do. 大多数推文没有这些信息,但只有一小部分。 Doing a location search will increase this chance eg https://api.twitter.com/1.1/search/tweets.json?q=place%3Acba60fe77bc80469&count=1 进行位置搜索会增加这个机会,例如https://api.twitter.com/1.1/search/tweets.json?q=place%3Acba60fe77bc80469&count=1

  "place": {
    "id": "cba60fe77bc80469",
    "url": "https://api.twitter.com/1.1/geo/id/cba60fe77bc80469.json",
    "place_type": "city",
    "name": "Tallinn",
    "full_name": "Tallinn, Harjumaa",
    "country_code": "EE",
    "country": "Eesti",
    "contained_within": [],
    "bounding_box": {
      "type": "Polygon",
      "coordinates": [
        [
          [
            24.5501404,
            59.3518286
          ],
          [
            24.9262886,
            59.3518286
          ],
          [
            24.9262886,
            59.4981855
          ],
          [
            24.5501404,
            59.4981855
          ]
        ]
      ]
    },
    "attributes": {}
  },

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM