简体   繁体   English

如何从每个json文件的第一行中删除前几个字符

[英]How to remove first few characters from every 1st line of each json file

I am relatively new to python. 我对python比较陌生。 I am trying to merge all JSON files into a one single JSON file from a folder. 我正在尝试将所有JSON文件从一个文件夹合并到一个JSON文件中。 I could do my merge. 我可以合并。 However I would like to remove the some characters of the 1st line in every file to make the entire JSON valid. 但是我想在每个文件中删除第一行的一些字符,以使整个JSON有效。

# Script to combine all jsons but need to remove the closing , at the end

import glob
import re

# read the whole folder
read_files = glob.glob("bus_stop_1012/*.json")

with open("bus_stop_1012/bus_arrival_1012.json", "wb") as outfile:
# this is the beginning of the combined file
outfile.write(' ')

for f in read_files:

    # will append each data file
    with open(f, "rb") as infile:

        outfile.write(infile.read())
        # will have to add , at the end of each element
        outfile.write(',')

# move back 1 character to remove the last , and end the file
outfile.seek(-1,1)
outfile.write(']}')

which generates this single JSON file from a example of 2 json files: 它从2个json文件的示例生成此单个JSON文件:

{"data": [{"time": "2016-03-02 17:45:20 SGT+0800", "result":{
"BusStopID": "1012", 
"Services": [
    {
        "NextBus": {
            "EstimatedArrival": "2016-03-02T17:48:21+08:00", 
            "Feature": "WAB", 
            "Latitude": "1.2871405", 
            "Load": "Seats Available", 
            "Longitude": "103.8456715", 
            "VisitNumber": "1"
        }, 
        "Operator": "SBST", 
        "OriginatingID": "10589", 
        "ServiceNo": "12", 
        "Status": "In Operation", 
        "SubsequentBus": {
            "EstimatedArrival": "2016-03-02T17:56:02+08:00", 
            "Feature": "WAB", 
            "Latitude": "0", 
            "Load": "Seats Available", 
            "Longitude": "0", 
            "VisitNumber": "1"
        }, 
        "SubsequentBus3": {
            "EstimatedArrival": "2016-03-02T18:06:02+08:00", 
            "Feature": "WAB", 
            "Latitude": "0", 
            "Load": "Seats Available", 
            "Longitude": "0", 
            "VisitNumber": "1"
        }, 
        "TerminatingID": "77009"
    }
], 
"odata.metadata":
"http://datamall2.mytransport.sg/ltaodataservice/$metadata#BusArrival/@Element"
}},{"data": [{"time": "2016-03-02 17:49:36 SGT+0800", "result":{
"BusStopID": "1012", 
"Services": [
    {
        "NextBus": {
            "EstimatedArrival": "2016-03-02T17:48:47+08:00", 
            "Feature": "WAB", 
            "Latitude": "1.2944553333333333", 
            "Load": "Seats Available", 
            "Longitude": "103.85045283333334", 
            "VisitNumber": "1"
        }, 
        "Operator": "SBST", 
        "OriginatingID": "10589", 
        "ServiceNo": "12", 
        "Status": "In Operation", 
        "SubsequentBus": {
            "EstimatedArrival": "2016-03-02T17:58:26+08:00", 
            "Feature": "WAB", 
            "Latitude": "1.2821243333333334", 
            "Load": "Seats Available", 
            "Longitude": "103.841401", 
            "VisitNumber": "1"
        }, 
        "SubsequentBus3": {
            "EstimatedArrival": "2016-03-02T18:06:02+08:00", 
            "Feature": "WAB", 
            "Latitude": "0", 
            "Load": "Seats Available", 
            "Longitude": "0", 
            "VisitNumber": "1"
        }, 
        "TerminatingID": "77009"
    }
    ], 
"odata.metadata":     "http://datamall2.mytransport.sg/ltaodataservice/$metadata#BusArrival/@Element"
}}]}

I would need the {"data": [ of each subsequent JSON file to be removed as it is found in every JSON file. 我需要删除每个后续JSON文件的{“ data”:[ ,因为它在每个JSON文件中都可以找到。

You could decode from JSON, extract the elements you want, then write those out as JSON again. 您可以从JSON解码,提取所需的元素,然后再次将其写为JSON。

If the goal is to produce one large {"data": [....]} list, you can get away with writing each element in the list separately if you take care not to write a last comma: 如果目标是生成一个大的{"data": [....]}列表,那么请注意不要写最后一个逗号,这样可以避免分别编写列表中的每个元素:

import glob
import json

# read the whole folder
read_files = glob.glob("bus_stop_1012/*.json")

with open("bus_stop_1012/bus_arrival_1012.json", "wb") as outfile:
    # this is the beginning of the combined file
    outfile.write('{"data": [\n')
    sep = ''
    for f in read_files:
        # will append each data file
        with open(f) as infile:
             try:
                for obj in json.load(infile)['data']:
                    outfile.write(sep)
                    json.dump(obj, outfile)
                    sep = ','
             except ValueError:
                 print 'Failed to load {}'.format(f)
    outfile.write(']}')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从文件的每一行中删除第一个和最后 n 个字符 - How to remove first and last n characters from each line of a file 如何将每一行的第一列数据添加到相应行中某些特定字符串或字符标记的每列的开头? - How to add the 1st column data of each line to the head of each column marked by some specific string or characters in the conresponding line? 如何将第一个文本文件的每一行与 Python 中第二个文本文件的每一行进行比较? - How do I compare every line of the 1st text file to every line of the 2nd text file in Python? 如何在txt文件中打印每行的第一个单词? - How do I print the 1st word of each line in a txt file? 如何使用python删除文件的前几个字符? - How do I remove the first few characters of a file with python? 比较从第一个文件的第一行和第二个文件的第二行开始的2个csv文件-Python - comparing 2 csv files starting from 1st line of first file with the 2nd line of the second file-Python 在冒号之前使用第一行单词来创建列标题,并从第一行和所有其余行的行中删除 - Use first line words before colon to create column headers and remove from row for 1st and all remaining rows 从文件的第一行中删除前32个字符 - remove first 32 characters from first line in file 如何删除每行中每个单词的前 3 个字符 - how to remove first 3 character of every word in each line 如何删除字符串的前几个字符? - How to remove the first few characters of a string?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM