简体   繁体   中英

Loading Json file and formatting data in Python

I have a Json file from stackoverflow which looks like this:
First Line Represent the Number of Json object in this it is 2

   2
    {"topic":"electronics","question":"What is the effective differencial effective of this circuit","excerpt":"I'm trying to work out, in general terms, the effective capacitance of this circuit (see diagram: http://i.stack.imgur.com/BS85b.png).  \n\nWhat is the effective capacitance of this circuit and will the ...\r\n        "}
    {"topic":"electronics","question":"Heat sensor with fan cooling","excerpt":"Can I know which component senses heat or acts as heat sensor in the following circuit?\nIn the given diagram, it is said that the 4148 diode acts as the sensor. But basically it is a zener diode and ...\r\n        "}

The Content of the Json File is given below

 question (string) : The text in the title of the question.
    excerpt (string) : Excerpt of the question body.
    topic (string) : The topic under which the question was posted

I am learning ML and I want to parse the data into the format below

data[i][0] = contains question
data[i][1] = contains string
data[i][2] = topic

so that I can train my classifier. I am new to python or there is some better technique to represent data as i using this as train data
I have written this code but does not work giving me error:

with open("ml.json") as t:
    data = json.load(t)
    print(data)

Assuming that each line (after the first line) contains one object (no object is more than one line), then this function (which returns a generator, and it's memory-efficient) will work.

import json

def loadJsonLines(filePath):
    with open(filePath) as fp:
        objCount = int(fp.readline().strip())
        for i in range(objCount):
            line = fp.readline()
            obj = json.loads(line)
            yield obj


if __name__=='__main__':
    import sys
    from pprint import pprint
    for obj in loadJsonLines(sys.argv[1]):
        pprint(obj)

    objList = list(loadJsonLines(sys.argv[1]))
    pprint(objList)

Also note that your file is not a json file , even though it contains json data in each line (except for first line which is integer), but the whole file is not json , so you should not give it a .json extension.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM