简体   繁体   English

将JSON文件写入一个txt文件中,每个JSON数据都在自己的行中

[英]Writing JSON files in one txt file with each JSON data on its own line

So I am querying Twitter API with a list of tweet IDs. 所以我用一个推文ID列表来查询Twitter API。 What I need to do is looping through the IDs in order to get the corresponding data from Twitter. 我需要做的是循环ID以便从Twitter获取相应的数据。 Then I need to store those JSON files into a txt file where each tweet's JSON data is on its own line. 然后我需要将这些JSON文件存储到一个txt文件中,其中每个推文的JSON数据都在它自己的行上。 Later I will have to read the txt file line by line to create a pandas df from it. 稍后我将逐行读取txt文件以从中创建一个pandas df。

I try to give you some fake data to show you the structure. 我试着给你一些假数据来向你展示结构。

twt.tweet_id.head()

0    000000000000000001
1    000000000000000002
2    000000000000000003
3    000000000000000004
4    000000000000000005
Name: tweet_id, dtype: int64

I don't know how to share the JSON files and I don't even know if I can. 我不知道如何共享JSON文件,我甚至不知道我是否可以。 After calling tweet._json what I get is a JSON file. 在调用tweet._json后,我得到的是一个JSON文件。

drop_lst = []     # this is needed to collect the IDs which don't work


for i in twt.tweet_id:   # twt.tweet_id is the pd.series with the IDs
    try:
        tweet = api.get_status(i)
        with open('tweet_json.txt', 'a') as f:
            f.write(str(tweet._json)+'\n')  #  tweet._json is the JSON file I need

    except tp.TweepError:
        drop_lst.append(i)

the above works but I think I have lost the JSON structure which I need later to create the dataframe 上面的工作,但我想我已经失去了JSON结构,我以后需要创建数据帧

drop_lst = []

for i in twt.tweet_id:
    try:
        tweet = api.get_status(i)
        with open('data.txt', 'a') as outfile:  
            json.dump(tweet._json, outfile)

    except tp.TweepError:
        drop_lst.append(i)

the above doesn't put each file on its own line. 以上内容并未将每个文件放在自己的行中。

I hope I was able to provide you with enough information to help me. 我希望我能够为您提供足够的信息来帮助我。

Thank you in advance for all your help. 提前感谢您的帮助。

Appending json to a file using json.dump doesn't include newlines, so they all wind up on the same line together. 使用json.dumpjson附加到文件不包括换行符,因此它们一起排在同一行。 I'd recommend collecting all of your json records into a list , then use join and dump that to a file 我建议将所有json记录收集到一个list ,然后使用join并将其转储到文件中

tweets, drop_lst = [], []

for i in twt.tweet_id:
    try:
        tweet = api.get_status(i)
        tweets.append(tweet._json)

    except tp.TweepError:
        drop_lst.append(i)

with open('data.txt', 'a') as fh:
    fh.write('\n') # to ensure that the json is on its own line to start
    fh.write('\n'.join(json.dumps(tweet) for tweet in tweets)) # this will concatenate the tweets into a newline delimited string

Then, to create your dataframe, you can read that file and stitch everything back together 然后,要创建数据框,您可以读取该文件并将所有内容拼接在一起

with open("data.txt") as fh:
    tweets = [json.loads(line) for line in fh if line]

df = pd.DataFrame(tweets)

This assumes that the json itself doesn't have newlines, which tweets might contain, so be wary 这假设json本身没有推文可能包含的换行符,所以要小心

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM