[英]Manage newlines and indentation in JSON
I'm writing some Python code (data extraction from a ConLL-U format file) and I want my data to be stored in a .json
file.我正在编写一些 Python 代码(从ConLL-U 格式文件中提取数据),我希望我的数据存储在
.json
文件中。 I'd like to achieve an output's format like the following (x are keys, y are values):我想实现如下输出格式(x 是键,y 是值):
{
"lemma": {"x": "y","x": [{"x":"y"}], "x": "y", "x": [{"x":"y"}], "x": "" },
"lemma1":{"x": "y", "x": [{"x":"y"}], "x": "y", "x": [{"x":"y"}], "x": "y" }...
}
Last section of my code (it's probably quite inefficient, but now I'm just intersted in formatting the json output):我代码的最后一部分(它可能效率很低,但现在我只是对格式化 json 输出感兴趣):
token_info= {}
...
sentences = []
tokens = []
idn_dep_dict = {}
for line in lines:
if line == '\n':
sentences.append(tokens)
tokens = []
else:
fields = line.strip().split('\t')
if len(fields) >= 1:
if fields[0].isdigit():
idn = fields[0]
lemma = fields[1]
upos = fields[3]
xpos = fields[4]
feats = fields[5]
dep = fields[6]
pos_pair = (upos,xpos)
tokens.append((idn, lemma, pos_pair,feats,dep))
idn_dep_dict[idn]=[dep]
else:
continue
for sentence in sentences:
dependencies_dict = {} #dictionary for the dependencies of the current sentence
for token in sentence:
idn, lemma, pos_pair, feats, dep = token
if dep == '0':
dependencies_dict[idn] = 'root'
if dep in idn_dep_dict:
for head_token in sentence:
if head_token[0] == dep:
dependencies_dict[idn] = head_token[2]
# Create a dictionary for the current token's information
current_token = {'x1': [upos], 'x2': [{'0': pos_pair}],'x3': [{'0': dependencies_dict[idn]}],'x4': feats}
token_info[lemma] = current_token
# Write the JSON data to a file
with open('token_info.json', 'w', encoding='utf-8') as f:
json.dump(token_info, f, ensure_ascii=False, indent = 2, separators=(',', ': '))
The current code generates a newline after each [,]
or {,}
or comma in the json file.当前代码在 json 文件中的每个
[,]
或{,}
或逗号后生成一个换行符。 I'd like to have each lemma = {corrisponding dictionary}
on each line.我想在每一行都有每个
lemma = {corrisponding dictionary}
。 Is it possible?是否可以? Thank you all in advance
谢谢大家
Serialize one level of the dictionary structure manually like this.像这样手动序列化一层字典结构。
import json
token_info = json.loads('''
{
"lemma": {"x": "y","x2": [{"x":"y"}], "x3": "y", "x4": [{"x":"y"}], "x5": "" },
"lemma1":{"x": "y", "x2": [{"x":"y"}], "x3": "y", "x4": [{"x":"y"}], "x5": "y" }
}
''')
lines = []
for k, v in token_info.items():
ks = json.dumps(k, ensure_ascii=False)
vs = json.dumps(v, ensure_ascii=False, separators=(',', ': '))
lines.append(ks + ': ' + vs)
src = '{\n ' + (',\n '.join(lines)) + '\n}'
print(src)
This will output the following.这将 output 以下。
{
"lemma": {"x": "y","x2": [{"x": "y"}],"x3": "y","x4": [{"x": "y"}],"x5": ""},
"lemma1": {"x": "y","x2": [{"x": "y"}],"x3": "y","x4": [{"x": "y"}],"x5": "y"}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.