Python从json文件读取记录并写入两个单独的json文件

Question

I have a twitter json file and I'm and trying to separate the English and French tweets into two separate files. 我有一个twitter json文件，我正在尝试将英语和法语推文分为两个单独的文件。 I'm using Python 2.7 with the following code: 我正在将Python 2.7与以下代码结合使用：

import json

with open('tweets.json', 'r') as f:

with open('english.json', 'w') as enF:
  with open('french.json', 'w') as frF:

    for line in f:
        tweet = json.loads(line)

        if tweet["lang"] == "en":
             json.dump(tweet, enF, sort_keys=True)
        elif tweet["lang"] == "fr":
             json.dump(tweet, frF, sort_keys=True)

This produces the two separate json files, one having English tweets and the other French, which I have checked. 这将产生两个单独的json文件，一个文件具有英语推文，另一个文件具有法语，我已经检查过。 The original file has one tweet per line. 原始文件每行一条推文。 The english.json and the french.json files have just a single line of all the tweets. english.json和french.json文件只有所有推文中的一行。 Not sure if that will be a problem, not even confident that this is correct. 不知道这是否会有问题，甚至不确定这是否正确。 So I passed the english.json again through this code (obviously I changed the name of the file) and it gives an error: 因此，我再次通过此代码传递了english.json（显然，我更改了文件名），并给出了一个错误：

Traceback (most recent call last):
File "C:\Users\jack\Desktop\twitClean\j4.py", line 10, in <module>
tweet = json.loads(line)
File "C:\Python27\lib\json\__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "C:\Python27\lib\json\decoder.py", line 367, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 4926 - line 1 column 691991 (char 4925 - 691990)

I've been working on this for the past three days, and have come up with nothing. 在过去的三天中，我一直在为此工作，但一无所获。 Can anyone please help and tell me what I'm doing wrong? 谁能帮忙告诉我我做错了什么吗？

Answer 1

What about loading the json file as such 这样加载json文件呢？

with open('tweets.json', 'r') as f:
    tweets_dict = json.load(f)

Then, given that the python-native representation of a json is a dictionnary, you can iter over it and build your french and english related dictionnaries as well. 然后，假设json的python本地表示形式是字典，则可以对其进行迭代，并构建与法语和英语相关的字典。 I mean, doing 我的意思是

fr_dict, en_dict, ot_dict = {}, {}, {}
for id_,tweet in tweets_dict.items():
    if tweet['lang'] == 'fr':
        fr_dict[id_] = tweet
    elif tweet['lang'] == 'en':
        en_dict[id_] = tweet
    else:
        ot_dict[id_] = tweet 

with open('french.json', 'w') as frF:
    json.dump(fr_dict, frF, sort_keys=True) 

with open('english.json', 'w') as enF:
    json.dump(en_dict, enF, sort_keys=True)

with open('other.json', 'w') as otF:
    json.dump(ot_dict, otF, sort_keys=True)

Answer 2

SOLVED: Unfortunately, being only a python hacker I cannot solve this using python. 已解决：不幸的是，作为一个Python黑客，我无法使用python解决此问题。 I'm sure there must be a way using python. 我确定一定有使用python的方法。 So if someone else needs such a solution here it is.The solution I found was using jq as follows: 因此，如果有人需要这样的解决方案，我发现的解决方案是使用jq如下：
cat jsonfile | jq '. | select(.lang=="en")' > savefile

Obviously using this code the jsonfile has to be read twice as I need the English and French tweets in separate files. 显然，使用此代码必须将jsonfile读取两次，因为我需要在单独的文件中使用英语和法语tweet。

Python从json文件读取记录并写入两个单独的json文件

问题描述

2 个解决方案

解决方案1
0 2017-04-03 14:01:02

解决方案2
0 2017-04-15 12:10:30

Python从json文件读取记录并写入两个单独的json文件

问题描述

2 个解决方案

解决方案1 0 2017-04-03 14:01:02

解决方案2 0 2017-04-15 12:10:30

解决方案1
0 2017-04-03 14:01:02

解决方案2
0 2017-04-15 12:10:30