[英]How to remove first few characters from every 1st line of each json file
I am relatively new to python. 我对python比较陌生。 I am trying to merge all JSON files into a one single JSON file from a folder. 我正在尝试将所有JSON文件从一个文件夹合并到一个JSON文件中。 I could do my merge. 我可以合并。 However I would like to remove the some characters of the 1st line in every file to make the entire JSON valid. 但是我想在每个文件中删除第一行的一些字符,以使整个JSON有效。
# Script to combine all jsons but need to remove the closing , at the end
import glob
import re
# read the whole folder
read_files = glob.glob("bus_stop_1012/*.json")
with open("bus_stop_1012/bus_arrival_1012.json", "wb") as outfile:
# this is the beginning of the combined file
outfile.write(' ')
for f in read_files:
# will append each data file
with open(f, "rb") as infile:
outfile.write(infile.read())
# will have to add , at the end of each element
outfile.write(',')
# move back 1 character to remove the last , and end the file
outfile.seek(-1,1)
outfile.write(']}')
which generates this single JSON file from a example of 2 json files: 它从2个json文件的示例生成此单个JSON文件:
{"data": [{"time": "2016-03-02 17:45:20 SGT+0800", "result":{
"BusStopID": "1012",
"Services": [
{
"NextBus": {
"EstimatedArrival": "2016-03-02T17:48:21+08:00",
"Feature": "WAB",
"Latitude": "1.2871405",
"Load": "Seats Available",
"Longitude": "103.8456715",
"VisitNumber": "1"
},
"Operator": "SBST",
"OriginatingID": "10589",
"ServiceNo": "12",
"Status": "In Operation",
"SubsequentBus": {
"EstimatedArrival": "2016-03-02T17:56:02+08:00",
"Feature": "WAB",
"Latitude": "0",
"Load": "Seats Available",
"Longitude": "0",
"VisitNumber": "1"
},
"SubsequentBus3": {
"EstimatedArrival": "2016-03-02T18:06:02+08:00",
"Feature": "WAB",
"Latitude": "0",
"Load": "Seats Available",
"Longitude": "0",
"VisitNumber": "1"
},
"TerminatingID": "77009"
}
],
"odata.metadata":
"http://datamall2.mytransport.sg/ltaodataservice/$metadata#BusArrival/@Element"
}},{"data": [{"time": "2016-03-02 17:49:36 SGT+0800", "result":{
"BusStopID": "1012",
"Services": [
{
"NextBus": {
"EstimatedArrival": "2016-03-02T17:48:47+08:00",
"Feature": "WAB",
"Latitude": "1.2944553333333333",
"Load": "Seats Available",
"Longitude": "103.85045283333334",
"VisitNumber": "1"
},
"Operator": "SBST",
"OriginatingID": "10589",
"ServiceNo": "12",
"Status": "In Operation",
"SubsequentBus": {
"EstimatedArrival": "2016-03-02T17:58:26+08:00",
"Feature": "WAB",
"Latitude": "1.2821243333333334",
"Load": "Seats Available",
"Longitude": "103.841401",
"VisitNumber": "1"
},
"SubsequentBus3": {
"EstimatedArrival": "2016-03-02T18:06:02+08:00",
"Feature": "WAB",
"Latitude": "0",
"Load": "Seats Available",
"Longitude": "0",
"VisitNumber": "1"
},
"TerminatingID": "77009"
}
],
"odata.metadata": "http://datamall2.mytransport.sg/ltaodataservice/$metadata#BusArrival/@Element"
}}]}
I would need the {"data": [ of each subsequent JSON file to be removed as it is found in every JSON file. 我需要删除每个后续JSON文件的{“ data”:[ ,因为它在每个JSON文件中都可以找到。
You could decode from JSON, extract the elements you want, then write those out as JSON again. 您可以从JSON解码,提取所需的元素,然后再次将其写为JSON。
If the goal is to produce one large {"data": [....]}
list, you can get away with writing each element in the list separately if you take care not to write a last comma: 如果目标是生成一个大的{"data": [....]}
列表,那么请注意不要写最后一个逗号,这样可以避免分别编写列表中的每个元素:
import glob
import json
# read the whole folder
read_files = glob.glob("bus_stop_1012/*.json")
with open("bus_stop_1012/bus_arrival_1012.json", "wb") as outfile:
# this is the beginning of the combined file
outfile.write('{"data": [\n')
sep = ''
for f in read_files:
# will append each data file
with open(f) as infile:
try:
for obj in json.load(infile)['data']:
outfile.write(sep)
json.dump(obj, outfile)
sep = ','
except ValueError:
print 'Failed to load {}'.format(f)
outfile.write(']}')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.