[英]python3.4 twitter data scrape error: KeyError: 'user'
我已經從Twitter的json txt文件中提取了4gb。 現在,我正在嘗試瀏覽文件並提取用戶位置。 當我在下面運行腳本時,出現錯誤:
文件“ filepath / test.py”,第18行,如果tweet ['user'] ['id']:KeyError:“ user”
收集的推文中是否可能缺少用戶ID? 我以為它不能為空。 我收集了較小的樣本,在四分之三的樣本中我得到相同的錯誤,它僅適用於一個我沒有發現任何json結構差異的數據集
import json
# Tweets are stored in in file "fname". In the file used for this script,
# each tweet was stored on one line
fname = 'test_with_sample.json'
with open(fname, 'r') as f:
# Create dictionary to later be stored as JSON. All data will be included
# in the list 'data'
users_with_geodata = {
"data": []
}
all_users = []
total_tweets = 0
geo_tweets = 0
for line in f:
tweet = json.loads(line)
if tweet['user']['id']:
total_tweets += 1
user_id = tweet['user']['id']
if user_id not in all_users:
all_users.append(user_id)
# Give users some data to find them by. User_id listed separately
# to make iterating this data later easier
user_data = {
"user_id": tweet['user']['id'],
"features": {
"name": tweet['user']['name'],
"id": tweet['user']['id'],
"screen_name": tweet['user']['screen_name'],
"tweets": 1,
"location": tweet['user']['location'],
}
}
if tweet['place']:
user_data["features"]["primary_geo"] = tweet['place']['full_name'] + ", " + tweet['place'][
'country']
user_data["features"]["geo_type"] = "Tweet place"
else:
user_data["features"]["primary_geo"] = tweet['user']['location']
user_data["features"]["geo_type"] = "User location"
# Add only tweets with some geo data to .json. Comment this if you want to include all tweets.
if user_data["features"]["primary_geo"]:
users_with_geodata['data'].append(user_data)
geo_tweets += 1
# If user already listed, increase their tweet count
elif user_id in all_users:
for user in users_with_geodata["data"]:
if user_id == user["user_id"]:
user["features"]["tweets"] += 1
#except KeyError:
# pass
# Count the total amount of tweets for those users that had geodata
for user in users_with_geodata["data"]:
geo_tweets = geo_tweets + user["features"]["tweets"]
# Get some aggregated numbers on the data
print
"The file included " + str(len(all_users)) + " unique users who tweeted with or without geo data"
print
"The file included " + str(
len(users_with_geodata['data'])) + " unique users who tweeted with geo data, including 'location'"
print
"The users with geo data tweeted " + str(geo_tweets) + " out of the total " + str(total_tweets) + " of tweets."
# Save data to JSON file
with open('users_geo_sample.json', 'w') as fout:
fout.write(json.dumps(users_with_geodata, indent=4))
為`if tweet ['user'] ['id']'為false的情況添加了異常處理,以繼續循環:
try:
...code..
except KeyError:
continue
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.