简体   繁体   English

Python从json文件读取记录并写入两个单独的json文件

[英]Python reading records from a json file and writing to two separate json files

I have a twitter json file and I'm and trying to separate the English and French tweets into two separate files. 我有一个twitter json文件,我正在尝试将英语和法语推文分为两个单独的文件。 I'm using Python 2.7 with the following code: 我正在将Python 2.7与以下代码结合使用:

import json

with open('tweets.json', 'r') as f:

with open('english.json', 'w') as enF:
  with open('french.json', 'w') as frF:

    for line in f:
        tweet = json.loads(line)

        if tweet["lang"] == "en":
             json.dump(tweet, enF, sort_keys=True)
        elif tweet["lang"] == "fr":
             json.dump(tweet, frF, sort_keys=True)

This produces the two separate json files, one having English tweets and the other French, which I have checked. 这将产生两个单独的json文件,一个文件具有英语推文,另一个文件具有法语,我已经检查过。 The original file has one tweet per line. 原始文件每行一条推文。 The english.json and the french.json files have just a single line of all the tweets. english.json和french.json文件只有所有推文中的一行。 Not sure if that will be a problem, not even confident that this is correct. 不知道这是否会有问题,甚至不确定这是否正确。 So I passed the english.json again through this code (obviously I changed the name of the file) and it gives an error: 因此,我再次通过此代码传递了english.json(显然,我更改了文件名),并给出了一个错误:

Traceback (most recent call last):
File "C:\Users\jack\Desktop\twitClean\j4.py", line 10, in <module>
tweet = json.loads(line)
File "C:\Python27\lib\json\__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "C:\Python27\lib\json\decoder.py", line 367, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 4926 - line 1 column 691991 (char 4925 - 691990)

I've been working on this for the past three days, and have come up with nothing. 在过去的三天中,我一直在为此工作,但一无所获。 Can anyone please help and tell me what I'm doing wrong? 谁能帮忙告诉我我做错了什么吗?

What about loading the json file as such 这样加载json文件呢?

with open('tweets.json', 'r') as f:
    tweets_dict = json.load(f)

Then, given that the python-native representation of a json is a dictionnary, you can iter over it and build your french and english related dictionnaries as well. 然后,假设json的python本地表示形式是字典,则可以对其进行迭代,并构建与法语和英语相关的字典。 I mean, doing 我的意思是

fr_dict, en_dict, ot_dict = {}, {}, {}
for id_,tweet in tweets_dict.items():
    if tweet['lang'] == 'fr':
        fr_dict[id_] = tweet
    elif tweet['lang'] == 'en':
        en_dict[id_] = tweet
    else:
        ot_dict[id_] = tweet 

with open('french.json', 'w') as frF:
    json.dump(fr_dict, frF, sort_keys=True) 

with open('english.json', 'w') as enF:
    json.dump(en_dict, enF, sort_keys=True)

with open('other.json', 'w') as otF:
    json.dump(ot_dict, otF, sort_keys=True)

SOLVED: Unfortunately, being only a python hacker I cannot solve this using python. 已解决:不幸的是,作为一个Python黑客,我无法使用python解决此问题。 I'm sure there must be a way using python. 我确定一定有使用python的方法。 So if someone else needs such a solution here it is.The solution I found was using jq as follows: 因此,如果有人需要这样的解决方案,我发现的解决方案是使用jq如下:
cat jsonfile | jq '. | select(.lang=="en")' > savefile

Obviously using this code the jsonfile has to be read twice as I need the English and French tweets in separate files. 显然,使用此代码必须将jsonfile读取两次,因为我需要在单独的文件中使用英语和法语tweet。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM