I have a Json file from stackoverflow which looks like this:
First Line Represent the Number of Json object in this it is 2
2
{"topic":"electronics","question":"What is the effective differencial effective of this circuit","excerpt":"I'm trying to work out, in general terms, the effective capacitance of this circuit (see diagram: http://i.stack.imgur.com/BS85b.png). \n\nWhat is the effective capacitance of this circuit and will the ...\r\n "}
{"topic":"electronics","question":"Heat sensor with fan cooling","excerpt":"Can I know which component senses heat or acts as heat sensor in the following circuit?\nIn the given diagram, it is said that the 4148 diode acts as the sensor. But basically it is a zener diode and ...\r\n "}
The Content of the Json File is given below
question (string) : The text in the title of the question.
excerpt (string) : Excerpt of the question body.
topic (string) : The topic under which the question was posted
I am learning ML and I want to parse the data into the format below
data[i][0] = contains question
data[i][1] = contains string
data[i][2] = topic
so that I can train my classifier. I am new to python or there is some better technique to represent data as i using this as train data
I have written this code but does not work giving me error:
with open("ml.json") as t:
data = json.load(t)
print(data)
Assuming that each line (after the first line) contains one object (no object is more than one line), then this function (which returns a generator, and it's memory-efficient) will work.
import json
def loadJsonLines(filePath):
with open(filePath) as fp:
objCount = int(fp.readline().strip())
for i in range(objCount):
line = fp.readline()
obj = json.loads(line)
yield obj
if __name__=='__main__':
import sys
from pprint import pprint
for obj in loadJsonLines(sys.argv[1]):
pprint(obj)
objList = list(loadJsonLines(sys.argv[1]))
pprint(objList)
Also note that your file is not a json file , even though it contains json data in each line (except for first line which is integer), but the whole file is not json , so you should not give it a .json extension.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.