简体   繁体   中英

How to remove comment lines from a JSON file in python

I am getting a JSON file with following format :

// 20170407
// http://info.employeeportal.org

{
 "EmployeeDataList": [
{
 "EmployeeCode": "200005ABH9",
 "Skill": CT70,
 "Sales": 0.0,
 "LostSales": 1010.4
} 
 ]
} 

Need to remove the extra comment lines present in the file.

I tried with the following code :

import json
import commentjson

with open('EmployeeDataList.json') as json_data:
            employee_data = json.load(json_data)
            '''employee_data = json.dump(json.load(json_data))'''
            '''employee_data = commentjson.load(json_data)'''
            print(employee_data)`

Still not able to remove the comments from the file and bring the JSON file in correct format.

Not getting where things are going wrong? Any direction in this regard is highly appreciated.Thanks in advance

You're not using commentjson correctly. It has the same interface as the json module:

import commentjson

with open('EmployeeDataList.json', 'r') as handle:
    employee_data = commentjson.load(handle)

print(employee_data)

Although in this case, your comments are simple enough that you probably don't need to install an extra module to remove them:

import json

with open('EmployeeDataList.json', 'r') as handle:
    fixed_json = ''.join(line for line in handle if not line.startswith('//'))
    employee_data = json.loads(fixed_json)

print(employee_data)

Note the difference here between the two code snippets is that json.loads is used instead of json.load , since you're parsing a string instead of a file object.

Try JSON-minify :

JSON-minify minifies blocks of JSON-like content into valid JSON by removing all whitespace and JS-style comments (single-line // and multiline /* .. */).

I usually read the JSON as a normal file, delete the comments and then parse it as a JSON string. It can be done in one line with the following snippet:

with open(path,'r') as f: jsonDict = json.loads('\n'.join(row for row in f if not row.lstrip().startswith("//")))

IMHO it is very convenient because it does not need CommentJSON or any other non standard library.

Well that's not a valid json format so just open it like you would a text document then delete anything from // to \\n .

with open("EmployeeDataList.json", "r") as rf:
    with open("output.json", "w") as wf:
        for line in rf.readlines():
            if line[0:2] == "//"
                continue
            wf.write(line)

Your file is parsable using HOCON .

pip install pyhocon

>>> from pyhocon import ConfigFactory
>>> conf = ConfigFactory.parse_file('data.txt')
>>> conf
ConfigTree([('EmployeeDataList',
             [ConfigTree([('EmployeeCode', '200005ABH9'),
                          ('Skill', 'CT70'),
                          ('Sales', 0.0),
                          ('LostSales', 1010.4)])])])

If it is the same number of lines every time you can just do:

fh = open('EmployeeDataList.NOTjson',"r")
rawText = fh.read()
json_data = rawText[rawText.index("\n",3)+1:]

This way json_data is now the string of text without the first 3 lines.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM