繁体   English   中英

使用python将json格式文件转换为tsv

[英]convert json format file to tsv using python

我有一个corpus.json文件,需要将其转换为tsv格式。这是一个巨大的文件,看起来像这样:

{'0': {'metadata': {'id': 'fQ3JoXLXxc4', 'title': '| Board Questions | 12 Maths | Equivalence Class | Equivalence Class Board Questions |', 'tags': ['Board Questions', '12 maths', '12 maths Board Questions', 'Previous Year Board Questions', 'Maths Board Questions', 'Board questions based on Equivalence Classes', 'Equivalence Class', 'Equivalence Classes in hindi'], 'description': 'Board Questions, 12 maths, 12 maths Board Questions, Previous Year Board Questions, Maths Board Questions, Board questions based on Equivalence Classes, Equivalence Class, Equivalence Classes in hindi, Equivalence Class for 12 maths, NCERT CBSE XII Maths,'}}, '1': {'subtitles': ' in this video were going to start taking a look at entropy and tropi and more specifically the kind of entropy we are going to be interested in is information entropy information entropy as opposed to another kind of entropy which you may have heard a probably heard of thermodynamic entropy information entropy comes up in the context of information theory there is actually a direct connection with thermodynamic entropy but were not going to address that here so what is entropy what is information entropy well you can think about it sort of intuitively as the uncertainty uncertainty put that in quotes since we dont really have a definition for uncertainty but you can think about it as the uncertainty in a random variable or random quantity or equivalently you can think about it as the information ....and so on

我正在使用以下代码:

import json
import csv
with open('Downloads/corpus.json') as json_file:  
    j = json.load(json_file)
with open('output.tsv', 'w') as output_file:
    dw = csv.DictWriter(output_file, sorted(j.keys()), delimiter='\t')
    dw.writeheader()
    dw.writerows(j)

我收到以下错误:

 ---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-110-a9cb3b17fdd1> in <module>()
      2     dw = csv.DictWriter(output_file, sorted(j.keys()), delimiter='\t')
      3     dw.writeheader()
----> 4     dw.writerows(j)

~/anaconda3/lib/python3.6/csv.py in writerows(self, rowdicts)
    156 
    157     def writerows(self, rowdicts):
--> 158         return self.writer.writerows(map(self._dict_to_list, rowdicts))
    159 
    160 # Guard Sniffer's type checking against builds that exclude complex()

~/anaconda3/lib/python3.6/csv.py in _dict_to_list(self, rowdict)
    146     def _dict_to_list(self, rowdict):
    147         if self.extrasaction == "raise":
--> 148             wrong_fields = rowdict.keys() - self.fieldnames
    149             if wrong_fields:
    150                 raise ValueError("dict contains fields not in fieldnames: "

AttributeError: 'str' object has no attribute 'keys'

此代码中应更改的内容。 还是有其他方法可以做到这一点。

j是类似JSON的对象; 这是一本字典。 在不确切知道您要做什么的情况下,我认为您不需要py_str=json.dumps(j) ,因为那样可以将类似JSON的字典转换成字符串(没有键)。

Python json文档

一些交互式终端命令示例:

>>> import json
>>> py_str = json.loads('{ "a": "b", "c": "d"}')
>>> py_str
{'a': 'b', 'c': 'd'}
>>> json.dumps(py_str)
'{"a": "b", "c": "d"}'
>>> py_str.keys()
dict_keys(['a', 'c'])
>>> json.dumps(py_str)[0]
'{'  # This is the cause of the failure

我不确定是否在这里遗漏了什么,但是在此代码块中:

with open('Downloads/corpus.json') as json_file:  
    j = json.load(json_file)

您是j是包含JSON数据的字典。 但在这一行:

py_str=json.dumps(j)

您正在将该字典转换为字符串(实质上撤消了您刚刚所做的操作)。 您看到的错误是说明字符串没有键。

在调用keys()方法时,应改为使用j而不是py_str

您的代码是正确的。 唯一的问题是,您正在尝试将json dict对象转换回str,如另一个答案中所述,这根本没有任何意义。

您想使用sorted(py_str[0].keys())什么? 不带[0]尝试。

小细节:您可以使用一个with语句,而不是两个:

import json
import csv

with open('output.tsv', 'w') as output_file, open('Downloads/corpus.json') as json_file:
    json_dict = json.load(json_file)
    dw = csv.DictWriter(output_file, sorted(json_dict.keys()), delimiter='\t')
    dw.writeheader()
    dw.writerows(json_dict)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM