如何从每个json文件的第一行中删除前几个字符

Question

I am relatively new to python. 我对python比较陌生。 I am trying to merge all JSON files into a one single JSON file from a folder. 我正在尝试将所有JSON文件从一个文件夹合并到一个JSON文件中。 I could do my merge. 我可以合并。 However I would like to remove the some characters of the 1st line in every file to make the entire JSON valid. 但是我想在每个文件中删除第一行的一些字符，以使整个JSON有效。

# Script to combine all jsons but need to remove the closing , at the end

import glob
import re

# read the whole folder
read_files = glob.glob("bus_stop_1012/*.json")

with open("bus_stop_1012/bus_arrival_1012.json", "wb") as outfile:
# this is the beginning of the combined file
outfile.write(' ')

for f in read_files:

    # will append each data file
    with open(f, "rb") as infile:

        outfile.write(infile.read())
        # will have to add , at the end of each element
        outfile.write(',')

# move back 1 character to remove the last , and end the file
outfile.seek(-1,1)
outfile.write(']}')

which generates this single JSON file from a example of 2 json files: 它从2个json文件的示例生成此单个JSON文件：

{"data": [{"time": "2016-03-02 17:45:20 SGT+0800", "result":{
"BusStopID": "1012", 
"Services": [
    {
        "NextBus": {
            "EstimatedArrival": "2016-03-02T17:48:21+08:00", 
            "Feature": "WAB", 
            "Latitude": "1.2871405", 
            "Load": "Seats Available", 
            "Longitude": "103.8456715", 
            "VisitNumber": "1"
        }, 
        "Operator": "SBST", 
        "OriginatingID": "10589", 
        "ServiceNo": "12", 
        "Status": "In Operation", 
        "SubsequentBus": {
            "EstimatedArrival": "2016-03-02T17:56:02+08:00", 
            "Feature": "WAB", 
            "Latitude": "0", 
            "Load": "Seats Available", 
            "Longitude": "0", 
            "VisitNumber": "1"
        }, 
        "SubsequentBus3": {
            "EstimatedArrival": "2016-03-02T18:06:02+08:00", 
            "Feature": "WAB", 
            "Latitude": "0", 
            "Load": "Seats Available", 
            "Longitude": "0", 
            "VisitNumber": "1"
        }, 
        "TerminatingID": "77009"
    }
], 
"odata.metadata":
"http://datamall2.mytransport.sg/ltaodataservice/$metadata#BusArrival/@Element"
}},{"data": [{"time": "2016-03-02 17:49:36 SGT+0800", "result":{
"BusStopID": "1012", 
"Services": [
    {
        "NextBus": {
            "EstimatedArrival": "2016-03-02T17:48:47+08:00", 
            "Feature": "WAB", 
            "Latitude": "1.2944553333333333", 
            "Load": "Seats Available", 
            "Longitude": "103.85045283333334", 
            "VisitNumber": "1"
        }, 
        "Operator": "SBST", 
        "OriginatingID": "10589", 
        "ServiceNo": "12", 
        "Status": "In Operation", 
        "SubsequentBus": {
            "EstimatedArrival": "2016-03-02T17:58:26+08:00", 
            "Feature": "WAB", 
            "Latitude": "1.2821243333333334", 
            "Load": "Seats Available", 
            "Longitude": "103.841401", 
            "VisitNumber": "1"
        }, 
        "SubsequentBus3": {
            "EstimatedArrival": "2016-03-02T18:06:02+08:00", 
            "Feature": "WAB", 
            "Latitude": "0", 
            "Load": "Seats Available", 
            "Longitude": "0", 
            "VisitNumber": "1"
        }, 
        "TerminatingID": "77009"
    }
    ], 
"odata.metadata":     "http://datamall2.mytransport.sg/ltaodataservice/$metadata#BusArrival/@Element"
}}]}

I would need the {"data": [ of each subsequent JSON file to be removed as it is found in every JSON file. 我需要删除每个后续JSON文件的{“ data”：[ ，因为它在每个JSON文件中都可以找到。

Answer 1

You could decode from JSON, extract the elements you want, then write those out as JSON again. 您可以从JSON解码，提取所需的元素，然后再次将其写为JSON。

If the goal is to produce one large {"data": [....]} list, you can get away with writing each element in the list separately if you take care not to write a last comma: 如果目标是生成一个大的{"data": [....]}列表，那么请注意不要写最后一个逗号，这样可以避免分别编写列表中的每个元素：

import glob
import json

# read the whole folder
read_files = glob.glob("bus_stop_1012/*.json")

with open("bus_stop_1012/bus_arrival_1012.json", "wb") as outfile:
    # this is the beginning of the combined file
    outfile.write('{"data": [\n')
    sep = ''
    for f in read_files:
        # will append each data file
        with open(f) as infile:
             try:
                for obj in json.load(infile)['data']:
                    outfile.write(sep)
                    json.dump(obj, outfile)
                    sep = ','
             except ValueError:
                 print 'Failed to load {}'.format(f)
    outfile.write(']}')

如何从每个json文件的第一行中删除前几个字符

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-03-06 14:35:33

如何从每个json文件的第一行中删除前几个字符

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-03-06 14:35:33

解决方案1
1 已采纳 2016-03-06 14:35:33