简体   繁体   English

使用python将json格式文件转换为tsv

[英]convert json format file to tsv using python

I have the a corpus.json file, which needs to be converted to tsv format.It is a huge file and looks like this: 我有一个corpus.json文件,需要将其转换为tsv格式。这是一个巨大的文件,看起来像这样:

{'0': {'metadata': {'id': 'fQ3JoXLXxc4', 'title': '| Board Questions | 12 Maths | Equivalence Class | Equivalence Class Board Questions |', 'tags': ['Board Questions', '12 maths', '12 maths Board Questions', 'Previous Year Board Questions', 'Maths Board Questions', 'Board questions based on Equivalence Classes', 'Equivalence Class', 'Equivalence Classes in hindi'], 'description': 'Board Questions, 12 maths, 12 maths Board Questions, Previous Year Board Questions, Maths Board Questions, Board questions based on Equivalence Classes, Equivalence Class, Equivalence Classes in hindi, Equivalence Class for 12 maths, NCERT CBSE XII Maths,'}}, '1': {'subtitles': ' in this video were going to start taking a look at entropy and tropi and more specifically the kind of entropy we are going to be interested in is information entropy information entropy as opposed to another kind of entropy which you may have heard a probably heard of thermodynamic entropy information entropy comes up in the context of information theory there is actually a direct connection with thermodynamic entropy but were not going to address that here so what is entropy what is information entropy well you can think about it sort of intuitively as the uncertainty uncertainty put that in quotes since we dont really have a definition for uncertainty but you can think about it as the uncertainty in a random variable or random quantity or equivalently you can think about it as the information ....and so on

I am using the following code: 我正在使用以下代码:

import json
import csv
with open('Downloads/corpus.json') as json_file:  
    j = json.load(json_file)
with open('output.tsv', 'w') as output_file:
    dw = csv.DictWriter(output_file, sorted(j.keys()), delimiter='\t')
    dw.writeheader()
    dw.writerows(j)

I get the following error: 我收到以下错误:

 ---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-110-a9cb3b17fdd1> in <module>()
      2     dw = csv.DictWriter(output_file, sorted(j.keys()), delimiter='\t')
      3     dw.writeheader()
----> 4     dw.writerows(j)

~/anaconda3/lib/python3.6/csv.py in writerows(self, rowdicts)
    156 
    157     def writerows(self, rowdicts):
--> 158         return self.writer.writerows(map(self._dict_to_list, rowdicts))
    159 
    160 # Guard Sniffer's type checking against builds that exclude complex()

~/anaconda3/lib/python3.6/csv.py in _dict_to_list(self, rowdict)
    146     def _dict_to_list(self, rowdict):
    147         if self.extrasaction == "raise":
--> 148             wrong_fields = rowdict.keys() - self.fieldnames
    149             if wrong_fields:
    150                 raise ValueError("dict contains fields not in fieldnames: "

AttributeError: 'str' object has no attribute 'keys'

What should be changed in this code. 此代码中应更改的内容。 Or is there any other method to do this. 还是有其他方法可以做到这一点。

j is your JSON-like object; j是类似JSON的对象; it's a dictionary. 这是一本字典。 Without knowing exactly what you're trying to do, I think you don't need the py_str=json.dumps(j) , as that turns your JSON-like dict back into a string (which doesn't have keys). 在不确切知道您要做什么的情况下,我认为您不需要py_str=json.dumps(j) ,因为那样可以将类似JSON的字典转换成字符串(没有键)。

Python json documentation Python json文档

Some example interactive terminal commands: 一些交互式终端命令示例:

>>> import json
>>> py_str = json.loads('{ "a": "b", "c": "d"}')
>>> py_str
{'a': 'b', 'c': 'd'}
>>> json.dumps(py_str)
'{"a": "b", "c": "d"}'
>>> py_str.keys()
dict_keys(['a', 'c'])
>>> json.dumps(py_str)[0]
'{'  # This is the cause of the failure

I'm not sure if I'm missing something here, but in this block: 我不确定是否在这里遗漏了什么,但是在此代码块中:

with open('Downloads/corpus.json') as json_file:  
    j = json.load(json_file)

you're j is a dictionary containing the JSON data. 您是j是包含JSON数据的字典。 But in this line: 但在这一行:

py_str=json.dumps(j)

you're converting that dict to a string (essentially undoing what you just did). 您正在将该字典转换为字符串(实质上撤消了您刚刚所做的操作)。 The error you're seeing is stating that strings don't have keys. 您看到的错误是说明字符串没有键。

You should instead use j instead of py_str when calling the keys() method. 在调用keys()方法时,应改为使用j而不是py_str

Your code is correct. 您的代码是正确的。 The only problem is that you are trying to convert json dict object back to str as is mentioned in another answer what doesn't make a sense at all. 唯一的问题是,您正在尝试将json dict对象转换回str,如另一个答案中所述,这根本没有任何意义。

What did you want to achieve with sorted(py_str[0].keys()) ? 您想使用sorted(py_str[0].keys())什么? Try it without [0] . 不带[0]尝试。

Small detail: You can use one with statement instead of two: 小细节:您可以使用一个with语句,而不是两个:

import json
import csv

with open('output.tsv', 'w') as output_file, open('Downloads/corpus.json') as json_file:
    json_dict = json.load(json_file)
    dw = csv.DictWriter(output_file, sorted(json_dict.keys()), delimiter='\t')
    dw.writeheader()
    dw.writerows(json_dict)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM