简体   繁体   中英

convert json format file to tsv using python

I have the a corpus.json file, which needs to be converted to tsv format.It is a huge file and looks like this:

{'0': {'metadata': {'id': 'fQ3JoXLXxc4', 'title': '| Board Questions | 12 Maths | Equivalence Class | Equivalence Class Board Questions |', 'tags': ['Board Questions', '12 maths', '12 maths Board Questions', 'Previous Year Board Questions', 'Maths Board Questions', 'Board questions based on Equivalence Classes', 'Equivalence Class', 'Equivalence Classes in hindi'], 'description': 'Board Questions, 12 maths, 12 maths Board Questions, Previous Year Board Questions, Maths Board Questions, Board questions based on Equivalence Classes, Equivalence Class, Equivalence Classes in hindi, Equivalence Class for 12 maths, NCERT CBSE XII Maths,'}}, '1': {'subtitles': ' in this video were going to start taking a look at entropy and tropi and more specifically the kind of entropy we are going to be interested in is information entropy information entropy as opposed to another kind of entropy which you may have heard a probably heard of thermodynamic entropy information entropy comes up in the context of information theory there is actually a direct connection with thermodynamic entropy but were not going to address that here so what is entropy what is information entropy well you can think about it sort of intuitively as the uncertainty uncertainty put that in quotes since we dont really have a definition for uncertainty but you can think about it as the uncertainty in a random variable or random quantity or equivalently you can think about it as the information ....and so on

I am using the following code:

import json
import csv
with open('Downloads/corpus.json') as json_file:  
    j = json.load(json_file)
with open('output.tsv', 'w') as output_file:
    dw = csv.DictWriter(output_file, sorted(j.keys()), delimiter='\t')
    dw.writeheader()
    dw.writerows(j)

I get the following error:

 ---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-110-a9cb3b17fdd1> in <module>()
      2     dw = csv.DictWriter(output_file, sorted(j.keys()), delimiter='\t')
      3     dw.writeheader()
----> 4     dw.writerows(j)

~/anaconda3/lib/python3.6/csv.py in writerows(self, rowdicts)
    156 
    157     def writerows(self, rowdicts):
--> 158         return self.writer.writerows(map(self._dict_to_list, rowdicts))
    159 
    160 # Guard Sniffer's type checking against builds that exclude complex()

~/anaconda3/lib/python3.6/csv.py in _dict_to_list(self, rowdict)
    146     def _dict_to_list(self, rowdict):
    147         if self.extrasaction == "raise":
--> 148             wrong_fields = rowdict.keys() - self.fieldnames
    149             if wrong_fields:
    150                 raise ValueError("dict contains fields not in fieldnames: "

AttributeError: 'str' object has no attribute 'keys'

What should be changed in this code. Or is there any other method to do this.

j is your JSON-like object; it's a dictionary. Without knowing exactly what you're trying to do, I think you don't need the py_str=json.dumps(j) , as that turns your JSON-like dict back into a string (which doesn't have keys).

Python json documentation

Some example interactive terminal commands:

>>> import json
>>> py_str = json.loads('{ "a": "b", "c": "d"}')
>>> py_str
{'a': 'b', 'c': 'd'}
>>> json.dumps(py_str)
'{"a": "b", "c": "d"}'
>>> py_str.keys()
dict_keys(['a', 'c'])
>>> json.dumps(py_str)[0]
'{'  # This is the cause of the failure

I'm not sure if I'm missing something here, but in this block:

with open('Downloads/corpus.json') as json_file:  
    j = json.load(json_file)

you're j is a dictionary containing the JSON data. But in this line:

py_str=json.dumps(j)

you're converting that dict to a string (essentially undoing what you just did). The error you're seeing is stating that strings don't have keys.

You should instead use j instead of py_str when calling the keys() method.

Your code is correct. The only problem is that you are trying to convert json dict object back to str as is mentioned in another answer what doesn't make a sense at all.

What did you want to achieve with sorted(py_str[0].keys()) ? Try it without [0] .

Small detail: You can use one with statement instead of two:

import json
import csv

with open('output.tsv', 'w') as output_file, open('Downloads/corpus.json') as json_file:
    json_dict = json.load(json_file)
    dw = csv.DictWriter(output_file, sorted(json_dict.keys()), delimiter='\t')
    dw.writeheader()
    dw.writerows(json_dict)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM