[英]How to convert JSON (Twitter Data) to CSV using Python
I am attempting to query the twitter search engine (search.twitter.com), convert the results into json, and then prepare the results as a csv for a research project. 我试图查询Twitter搜索引擎(search.twitter.com),将结果转换为json,然后将结果作为csv准备用于研究项目。 I am a python novice, but I have managed to code 2/3 of the program myself. 我是python新手,但是我自己已经设法编写了程序的2/3。 However, I have a difficult time converting my json file into the csv format. 但是,我很难将json文件转换为csv格式。 I have tried various suggested techniques without success. 我尝试了各种建议的技术,但均未成功。 What am I doing wrong here? 我在这里做错了什么?
Here is what I have so far: 这是我到目前为止的内容:
import twitter, os, json, csv
qname = raw_input("Please enter the term(s) you wish to search for: ")
date = int(raw_input("Please enter today's date (no dashes or spaces): "))
nname = raw_input("Please enter a nickname for this query (no spaces): ")
q1 = raw_input("Would you like to set a custom directory? Enter Yes or No: ")
if q1 == 'No' or 'no' or 'n' or 'N':
dirname = 'C:\Users\isaac\Desktop\TPOP'
elif q1 == 'Yes' or 'yes' or 'y' or 'Y':
dirname = raw_input("Please enter the directory path:")
ready = raw_input("Are you ready to begin? Enter Yes or No: ")
while ready == 'Yes' or 'yes' or 'y' or 'Y':
twitter_search = twitter.Twitter(domain = "search.Twitter.com")
search_results = []
for page in range (1,10):
search_results.append(twitter_search.search(q=qname, rpp=1, page=page))
ready1 = raw_input("Done! Are you ready to continue? Enter Yes or No: ")
if ready1 == 'Yes' or 'yes' or 'y' or 'Y':
break
ready3 = raw_input("Do you want to save output as a file? Enter Yes or No: ")
while ready3 == 'Yes' or 'yes' or 'y' or 'Y':
os.chdir(dirname)
filename = 'results.%s.%06d.json' %(nname,date)
t = open (filename, 'wb+')
s = json.dumps(search_results, sort_keys=True, indent=2)
print >> t,s
t.close()
ready4 = raw_input("Done! Are you ready to continue? Enter Yes or No: ")
if ready4 == 'Yes' or 'yes' or 'y' or 'Y':
break
ready5 = raw_input("Do you want to save output as a csv/excel file? Enter Yes or No: ")
while ready5 == 'Yes' or 'yes' or 'y' or 'Y':
filename2 = 'results.%s.%06d.csv' %(nname,date)
z = json.dumps(search_results, sort_keys=True, indent=2)
x=json.loads(z)
json_string = z
json_array = x
columns = set()
for entity in json_array:
if entity == "created_at" or "from_user" or "from_user_id" or "from_user_name" or "geo" or "id" or "id_str" or "iso_language_code" or "text":
columns.update(set(entity))
writer = csv.writer(open(filename2, 'wb+'))
writer.writerow(list(columns))
for entity in json_array:
row = []
for c in columns:
if c in entity: row.append(str(entity[c]))
else: row.append('')
You have several different problems going on. 您遇到了几个不同的问题。
First off, the syntax of 首先,语法
x == 'a' or 'b' or 'c'
probably doesn't do what you think it does. 可能没有按照您的想法去做。 You should use 你应该用
x in ('a', 'b', 'c')
instead. 代替。
Second, your ready5
variable never changes and won't work right in the loop. 其次,您的ready5
变量永远不会更改,并且不会在循环中正常工作。 Try 尝试
while True:
ready5 = raw_input("Do you want to save output as a csv/excel file? Enter Yes or No: ")
if ready5 not in (...):
break
And finally, there's something wrong with your dumping/loading code. 最后,您的转储/加载代码有问题。 What you're getting from twitter should be a JSON string. 从Twitter获得的内容应该是JSON字符串。 There's some code you've left out from your question, so I can't tell for sure, but I don't think you want to be using json.dumps at all. 您的问题中遗漏了一些代码,因此无法确定,但我认为您根本不想使用json.dumps。 You're reading from JSON (using json.loads
) and writing to CSV (using csv.writer.writerow
). 您正在从 JSON中读取 (使用json.loads
)并写入 CSV中(使用csv.writer.writerow
)。
After some searching around, I found the answer here: http://michelleminkoff.com/2011/02/01/making-the-structured-usable-transform-json-into-a-csv/ 经过一番搜索后,我在这里找到了答案: http : //michelleminkoff.com/2011/02/01/making-the-structured-usable-transform-json-into-a-csv/
The code should look something like this:(if you are search the twitter python api) 代码应如下所示:(如果您正在搜索twitter python api)
filename2 = '/path/to/my/file.csv'
writer = csv.writer(open(filename2, 'w'))
z = json.dumps(search_results, sort_keys=True, indent=2)
parsed_json=json.loads(z)
#X needs to be the number of page you pulled less one. So 5 pages would be 4.
while n<X:
for tweet in parsed_json[n]['results']:
row = []
row.append(str(tweet['from_user'].encode('utf-8')))
row.append(str(tweet['created_at'].encode('utf-8')))
row.append(str(tweet['text'].encode('utf-8')))
writer.writerow(row)
n = n +1
Thanks Everyone for the help! 谢谢大家的帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.