简体   繁体   English

管理 JSON 中的换行和缩进

[英]Manage newlines and indentation in JSON

I'm writing some Python code (data extraction from a ConLL-U format file) and I want my data to be stored in a .json file.我正在编写一些 Python 代码(从ConLL-U 格式文件中提取数据),我希望我的数据存储在.json文件中。 I'd like to achieve an output's format like the following (x are keys, y are values):我想实现如下输出格式(x 是键,y 是值):

{
    "lemma": {"x": "y","x": [{"x":"y"}], "x": "y", "x": [{"x":"y"}], "x":  "" },
    "lemma1":{"x": "y", "x": [{"x":"y"}], "x": "y", "x": [{"x":"y"}], "x":  "y" }...
}

Last section of my code (it's probably quite inefficient, but now I'm just intersted in formatting the json output):我代码的最后一部分(它可能效率很低,但现在我只是对格式化 json 输出感兴趣):

 token_info= {}

...

sentences = []
tokens = []
idn_dep_dict = {}

for line in lines:
    if line == '\n': 
        sentences.append(tokens)
        tokens = [] 
    else:
        fields = line.strip().split('\t') 
            if len(fields) >= 1:
               if fields[0].isdigit(): 
                     idn = fields[0] 
                     lemma = fields[1]
                     upos = fields[3]
                     xpos = fields[4]
                     feats = fields[5]
                     dep = fields[6]
                
                     pos_pair = (upos,xpos)
                     tokens.append((idn, lemma, pos_pair,feats,dep))
                     idn_dep_dict[idn]=[dep]                                 
                else:
                   continue

for sentence in sentences:
    dependencies_dict = {} #dictionary for the dependencies of the current sentence
    for token in sentence:
        idn, lemma, pos_pair, feats, dep = token 
        if dep == '0':
            dependencies_dict[idn] = 'root'
        if dep in idn_dep_dict:
            for head_token in sentence: 
                if head_token[0] == dep: 
                    dependencies_dict[idn] = head_token[2] 

        # Create a dictionary for the current token's information
        current_token = {'x1': [upos], 'x2': [{'0': pos_pair}],'x3': [{'0': dependencies_dict[idn]}],'x4': feats}
        token_info[lemma] = current_token
        
# Write the JSON data to a file
with open('token_info.json', 'w', encoding='utf-8') as f:
    json.dump(token_info, f, ensure_ascii=False, indent = 2, separators=(',', ': '))

The current code generates a newline after each [,] or {,} or comma in the json file.当前代码在 json 文件中的每个[,]{,}或逗号后生成一个换行符。 I'd like to have each lemma = {corrisponding dictionary} on each line.我想在每一行都有每个lemma = {corrisponding dictionary} Is it possible?是否可以? Thank you all in advance谢谢大家

Serialize one level of the dictionary structure manually like this.像这样手动序列化一层字典结构。

import json
token_info = json.loads('''
{
    "lemma": {"x": "y","x2": [{"x":"y"}], "x3": "y", "x4": [{"x":"y"}], "x5":  "" },
    "lemma1":{"x": "y", "x2": [{"x":"y"}], "x3": "y", "x4": [{"x":"y"}], "x5":  "y" }
}
''')

lines = []
for k, v in token_info.items():
    ks = json.dumps(k, ensure_ascii=False)
    vs = json.dumps(v, ensure_ascii=False, separators=(',', ': '))
    lines.append(ks + ': ' + vs)
src = '{\n    ' + (',\n    '.join(lines)) + '\n}'
print(src)

This will output the following.这将 output 以下。

{
    "lemma": {"x": "y","x2": [{"x": "y"}],"x3": "y","x4": [{"x": "y"}],"x5": ""},
    "lemma1": {"x": "y","x2": [{"x": "y"}],"x3": "y","x4": [{"x": "y"}],"x5": "y"}
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM