简体   繁体   English

如何在 .txt 文件中的 JSON 对象之间添加逗号,然后在 Python 中将其转换为 JSON 数组

[英]How to add commas in between JSON objects present in a .txt file and then convert it into JSON array in Python

I am reading a txt file which has JSON objects where the objects are not separated by commas.我正在读取一个 txt 文件,其中包含 JSON 对象,其中对象不以逗号分隔。 I would like to add commas between the json objects and place them all into a JSON list or Array.我想在 json 对象之间添加逗号并将它们全部放入 JSON 列表或数组中。

I have tried JSON.loads but I am getting the JSON Decode error.我已经尝试过 JSON.loads,但我收到了 JSON 解码错误。 So I realized i am supposed to put commas in between the different objects present in the .txt file所以我意识到我应该在 .txt 文件中存在的不同对象之间放置逗号

Below is the example of the file content in .txt下面是.txt中文件内容的例子

{
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Saxena96",
    "author": {
        "ftail": "\n",
        "ftext": "Sanjeev Saxena"
    },
    "title": {
        "ftail": "\n",
        "ftext": "Parallel Integer Sorting and Simulation Amongst CRCW Models."
    },
    "pages": {
        "ftail": "\n",
        "ftext": "607-619"
    },
    "year": {
        "ftail": "\n",
        "ftext": "1996"
    },
    "volume": {
        "ftail": "\n",
        "ftext": "33"
    },
    "journal": {
        "ftail": "\n",
        "ftext": "Acta Inf."
    },
    "number": {
        "ftail": "\n",
        "ftext": "7"
    },
    "url": {
        "ftail": "\n",
        "ftext": "db/journals/acta/acta33.htmlfSaxena96"
    },
    "ee": {
        "ftail": "\n",
        "ftext": "http://dx.doi.org/10.1007/BF03036466"
    },
    "ftail": "\n",
    "ftext": "\n"
}{
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Simon83",
    "author": {
        "ftail": "\n",
        "ftext": "Hans-Ulrich Simon"
    },
    "title": {
        "ftail": "\n",
        "ftext": "Pattern Matching in Trees and Nets."
    },
    "pages": {
        "ftail": "\n",
        "ftext": "227-248"
    },
    "year": {
        "ftail": "\n",
        "ftext": "1983"
    },
    "volume": {
        "ftail": "\n",
        "ftext": "20"
    },
    "journal": {
        "ftail": "\n",
        "ftext": "Acta Inf."
    },
    "url": {
        "ftail": "\n",
        "ftext": "db/journals/acta/acta20.htmlfSimon83"
    },
    "ee": {
        "ftail": "\n",
        "ftext": "http://dx.doi.org/10.1007/BF01257084"
    },
    "ftail": "\n",
    "ftext": "\n"
}

'''''''''''''''''''''''''''''''''''' '''''''''''''''''''''''''''''''''''

Expected Result:预期结果:

'''''''''''''''''''''''''''''''''''' '''''''''''''''''''''''''''''''''''

[
{
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Saxena96",
    "author": {
        "ftail": "\n",
        "ftext": "Sanjeev Saxena"
    },
    "title": {
        "ftail": "\n",
        "ftext": "Parallel Integer Sorting and Simulation Amongst CRCW Models."
    },
    "pages": {
        "ftail": "\n",
        "ftext": "607-619"
    },
    "year": {
        "ftail": "\n",
        "ftext": "1996"
    },
    "volume": {
        "ftail": "\n",
        "ftext": "33"
    },
    "journal": {
        "ftail": "\n",
        "ftext": "Acta Inf."
    },
    "number": {
        "ftail": "\n",
        "ftext": "7"
    },
    "url": {
        "ftail": "\n",
        "ftext": "db/journals/acta/acta33.htmlfSaxena96"
    },
    "ee": {
        "ftail": "\n",
        "ftext": "http://dx.doi.org/10.1007/BF03036466"
    },
    "ftail": "\n",
    "ftext": "\n"
},
{
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Simon83",
    "author": {
        "ftail": "\n",
        "ftext": "Hans-Ulrich Simon"
    },
    "title": {
        "ftail": "\n",
        "ftext": "Pattern Matching in Trees and Nets."
    },
    "pages": {
        "ftail": "\n",
        "ftext": "227-248"
    },
    "year": {
        "ftail": "\n",
        "ftext": "1983"
    },
    "volume": {
        "ftail": "\n",
        "ftext": "20"
    },
    "journal": {
        "ftail": "\n",
        "ftext": "Acta Inf."
    },
    "url": {
        "ftail": "\n",
        "ftext": "db/journals/acta/acta20.htmlfSimon83"
    },
    "ee": {
        "ftail": "\n",
        "ftext": "http://dx.doi.org/10.1007/BF01257084"
    },
    "ftail": "\n",
    "ftext": "\n"
}
]

'''''''''''''''''''' '''''''''''''''''''

you can add comma between objects with reqexp:您可以使用 reqexp 在对象之间添加逗号:

import re

with open('name.txt', 'r') as input, open('out.txt', 'w') as output:
    output.write("[\n")
    for line in input:
        line = re.sub('}{', '},{', line)
        output.write('    '+line)
    output.write("]\n")

If you can always guarantee that your JSON will be formatted as in your example, ie new JSON object begins on the same line where the last one ends and there is no indent, you can get by just by reading your JSON into a buffer until you encounter such line and then sending the buffer for JSON parsing - rinse & repeat:如果您始终可以保证您的 JSON 将按照您的示例进行格式化,即新的 JSON 对象在最后一个结束的同一行开始并且没有缩进,您可以通过将您的 JSON 读入缓冲区,直到您遇到这样的行,然后发送缓冲区进行 JSON 解析 - 冲洗并重复:

import json

parsed = []  # a list to hold individually parsed JSON objects
with open('path/to/your.json') as f:
    buffer = ''
    for line in f:
        if line[0] == '}':  # end of the current JSON object
            parsed.append(json.loads(buffer + '}'))
            buffer = line[1:]
        else:
            buffer += line

print(json.dumps(parsed, indent=2))  # just to make sure it all went well

Which would yield:这将产生:

[
  {
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Saxena96",
    "author": {
      "ftail": "\n",
      "ftext": "Sanjeev Saxena"
    },
    "title": {
      "ftail": "\n",
      "ftext": "Parallel Integer Sorting and Simulation Amongst CRCW Models."
    },
    "pages": {
      "ftail": "\n",
      "ftext": "607-619"
    },
    "year": {
      "ftail": "\n",
      "ftext": "1996"
    },
    "volume": {
      "ftail": "\n",
      "ftext": "33"
    },
    "journal": {
      "ftail": "\n",
      "ftext": "Acta Inf."
    },
    "number": {
      "ftail": "\n",
      "ftext": "7"
    },
    "url": {
      "ftail": "\n",
      "ftext": "db/journals/acta/acta33.htmlfSaxena96"
    },
    "ee": {
      "ftail": "\n",
      "ftext": "http://dx.doi.org/10.1007/BF03036466"
    },
    "ftail": "\n",
    "ftext": "\n"
  },
  {
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Simon83",
    "author": {
      "ftail": "\n",
      "ftext": "Hans-Ulrich Simon"
    },
    "title": {
      "ftail": "\n",
      "ftext": "Pattern Matching in Trees and Nets."
    },
    "pages": {
      "ftail": "\n",
      "ftext": "227-248"
    },
    "year": {
      "ftail": "\n",
      "ftext": "1983"
    },
    "volume": {
      "ftail": "\n",
      "ftext": "20"
    },
    "journal": {
      "ftail": "\n",
      "ftext": "Acta Inf."
    },
    "url": {
      "ftail": "\n",
      "ftext": "db/journals/acta/acta20.htmlfSimon83"
    },
    "ee": {
      "ftail": "\n",
      "ftext": "http://dx.doi.org/10.1007/BF01257084"
    },
    "ftail": "\n",
    "ftext": "\n"
  }
]

If your case is not as clear cut (ie you can't predict the formatting) you can try out some of the iterative/event-based JSON parsers ( ijson for example) which would be able to tell you once a 'root' object is closed so that you can 'split' the parsed JSON objects into a sequence.如果您的情况不那么明确(即您无法预测格式),您可以尝试一些迭代/基于事件的 JSON 解析器(例如ijson ),它们一旦“根”对象就可以告诉您已关闭,以便您可以将解析的 JSON 对象“拆分”为一个序列。

UPDATE : On a second thought, you don't need anything apart from the built-in json module even if your concatenated JSONs are not properly or indented at all - you can use json.JSONDecoder.raw_decode() (and its undocumented second parameter) to traverse your data and look for valid JSON structures in an iterative manner until you've traversed your whole file (or encountered an error).更新:再想一想,即使您的连接 JSON 没有正确或根本没有缩进,您也不需要除了内置json模块之外的任何东西 - 您可以使用json.JSONDecoder.raw_decode() (及其未记录的第二个参数) 以迭代方式遍历数据并查找有效的 JSON 结构,直到遍历整个文件(或遇到错误)。 For example:例如:

import json

parser = json.JSONDecoder()
parsed = []  # a list to hold individually parsed JSON structures
with open('test.json') as f:
    data = f.read()
head = 0  # hold the current position as we parse
while True:
    head = (data.find('{', head) + 1 or data.find('[', head) + 1) - 1
    try:
        struct, head = parser.raw_decode(data, head)
        parsed.append(struct)
    except (ValueError, json.JSONDecodeError):  # no more valid JSON structures
        break

print(json.dumps(parsed, indent=2))  # make sure it all went well

Should give you the same result as above but this time won't depend on } being the first character of a new line whenever your JSON object 'closes'.应该给你与上面相同的结果,但这次不依赖于}是一个新行的第一个字符,只要你的 JSON 对象“关闭”。 It should also work for JSON arrays stacked back-to-back.它也适用于背靠背堆叠的 JSON 数组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM