[英]Writing JSON files in one txt file with each JSON data on its own line
So I am querying Twitter API with a list of tweet IDs. 所以我用一个推文ID列表来查询Twitter API。 What I need to do is looping through the IDs in order to get the corresponding data from Twitter.
我需要做的是循环ID以便从Twitter获取相应的数据。 Then I need to store those JSON files into a txt file where each tweet's JSON data is on its own line.
然后我需要将这些JSON文件存储到一个txt文件中,其中每个推文的JSON数据都在它自己的行上。 Later I will have to read the txt file line by line to create a pandas df from it.
稍后我将逐行读取txt文件以从中创建一个pandas df。
I try to give you some fake data to show you the structure. 我试着给你一些假数据来向你展示结构。
twt.tweet_id.head()
0 000000000000000001
1 000000000000000002
2 000000000000000003
3 000000000000000004
4 000000000000000005
Name: tweet_id, dtype: int64
I don't know how to share the JSON files and I don't even know if I can. 我不知道如何共享JSON文件,我甚至不知道我是否可以。 After calling tweet._json what I get is a JSON file.
在调用tweet._json后,我得到的是一个JSON文件。
drop_lst = [] # this is needed to collect the IDs which don't work
for i in twt.tweet_id: # twt.tweet_id is the pd.series with the IDs
try:
tweet = api.get_status(i)
with open('tweet_json.txt', 'a') as f:
f.write(str(tweet._json)+'\n') # tweet._json is the JSON file I need
except tp.TweepError:
drop_lst.append(i)
the above works but I think I have lost the JSON structure which I need later to create the dataframe 上面的工作,但我想我已经失去了JSON结构,我以后需要创建数据帧
drop_lst = []
for i in twt.tweet_id:
try:
tweet = api.get_status(i)
with open('data.txt', 'a') as outfile:
json.dump(tweet._json, outfile)
except tp.TweepError:
drop_lst.append(i)
the above doesn't put each file on its own line. 以上内容并未将每个文件放在自己的行中。
I hope I was able to provide you with enough information to help me. 我希望我能够为您提供足够的信息来帮助我。
Thank you in advance for all your help. 提前感谢您的帮助。
Appending json
to a file using json.dump
doesn't include newlines, so they all wind up on the same line together. 使用
json.dump
将json
附加到文件不包括换行符,因此它们一起排在同一行。 I'd recommend collecting all of your json records into a list
, then use join
and dump that to a file 我建议将所有json记录收集到一个
list
,然后使用join
并将其转储到文件中
tweets, drop_lst = [], []
for i in twt.tweet_id:
try:
tweet = api.get_status(i)
tweets.append(tweet._json)
except tp.TweepError:
drop_lst.append(i)
with open('data.txt', 'a') as fh:
fh.write('\n') # to ensure that the json is on its own line to start
fh.write('\n'.join(json.dumps(tweet) for tweet in tweets)) # this will concatenate the tweets into a newline delimited string
Then, to create your dataframe, you can read that file and stitch everything back together 然后,要创建数据框,您可以读取该文件并将所有内容拼接在一起
with open("data.txt") as fh:
tweets = [json.loads(line) for line in fh if line]
df = pd.DataFrame(tweets)
This assumes that the json
itself doesn't have newlines, which tweets might contain, so be wary 这假设
json
本身没有推文可能包含的换行符,所以要小心
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.