如何在Python中解析文本文件并转换为JSON

Question

I have a large file formatted like the following: 我有一个大文件，格式如下：

"string in quotes"
string
string
string
number
|-

...this repeats for a while. ...重复一会儿。 I'm trying to convert it to JSON, so each of the chunks is like this: 我正在尝试将其转换为JSON，所以每个块都是这样的：

"name": "string in quotes"
"description": "string"
"info": "string"
"author": "string"
"year": number

This is what I have so far: 这是我到目前为止的内容：

import shutil
import os
import urllib

myFile = open('unformatted.txt','r')
newFile = open("formatted.json", "w")

newFile.write('{'+'\n'+'list: {'+'\n')

for line in myFile:
    newFile.write() // this is where I'm not sure what to write

newFile.write('}'+'\n'+'}')

myFile.close()
newFile.close()

I think I could do something like with the line number modulo something, but I'm not sure if that's the right way to go about it. 我想我可以对行号做一些模运算，但是我不确定这是否是正确的方法。

Answer 1

You can use itertools.groupby to group all the sections then json.dump the dicts to your json file: 您可以使用itertools.groupby对所有部分进行json.dump ，然后将json.dump字典保存到json文件中：

from itertools import groupby
import json
names = ["name", "description","info","author", "year"]

with open("test.csv") as f, open("out.json","w") as out:
    grouped = groupby(map(str.rstrip,f), key=lambda x: x.startswith("|-"))
    for k,v in grouped:
        if not k:
            json.dump(dict(zip(names,v)),out)
            out.write("\n")

Input: 输入：

"string in quotes"
string
string
string
number
|-
"other string in quotes"
string2
string2
string2
number2

Output: 输出：

{"author": "string", "name": "\"string in quotes\"", "description": "string", "info": "string", "year": "number"}
{"author": "string2", "name": "\"other string in quotes\"", "description": "string2", "info": "string2", "year": "number2"}

To access just iterate over the file and loads: 要访问仅遍历文件并加载：

In [6]: with open("out.json") as out:
            for line in out:
                 print(json.loads(line))
   ...:         
{'name': '"string in quotes"', 'info': 'string', 'author': 'string', 'year': 'number', 'description': 'string'}
{'name': '"other string in quotes"', 'info': 'string2', 'author': 'string2', 'year': 'number2', 'description': 'string2'}

Answer 2

I think this would do the trick. 我认为这可以解决问题。

import itertools
import json

with open('unformatted.txt', 'r') as f_in, open('formatted.json', 'w') as f_out:
    for name, desc, info, author, yr, ignore in itertools.izip_longest(*[f_in]*6):
        record = {
            "name": '"' + name.strip() + '"',
            "description": desc.strip(),
            "info": info.strip(),
            "author": author.strip(),
            "year": int(yr.strip()),
        }
        f_out.write(json.dumps(record))

Answer 3

This is a rough example which does the basic job. 这是一个基本的例子。

It uses a generator to split the input into batches (of 6) first and another one to add the keys to the values. 它使用生成器首先将输入分为六批（每批六批），然后使用另一批将键添加到值中。

import json


def read():
    with open('input.txt', 'r') as f:
        return [l.strip() for l in f.readlines()]


def batch(content, n=1):
    length = len(content)
    for num_idx in range(0, length, n):
        yield content[num_idx:min(num_idx+n, length)]


def emit(batched):
    for n, name in enumerate([
        'name', 'description', 'info', 'author', 'year'
    ]):
        yield name, batched[n]

content = read()
batched = batch(content, 6)
res = [dict(emit(b)) for b in batched]

print(res)

with open('output.json', 'w') as f:
    f.write(json.dumps(res, indent=4))

Update 更新资料

Using this approach you can easily hook in formatting functions so the year and name values will be correct. 使用这种方法，您可以轻松连接格式函数，以便年份和名称值正确。

Extend the emit function like this: 扩展发射函数，如下所示：

def emit(batched):
    def _quotes(q):
        return q.replace('"', '')

    def _pass(p):
        return p

    def _num(n):
        try:
            return int(n)
        except ValueError:
            return n

    for n, (name, func) in enumerate([
        ('name', _quotes),
        ('description', _pass),
        ('info', _pass),
        ('author', _pass),
        ('year', _num)
    ]):
        yield name, func(batched[n])

如何在Python中解析文本文件并转换为JSON

问题描述

3 个解决方案

解决方案1
5 已采纳 2015-07-18 20:04:19

解决方案2
1 2015-07-18 20:00:47

解决方案3
1 2015-07-18 21:43:51

如何在Python中解析文本文件并转换为JSON

问题描述

3 个解决方案

解决方案1 5 已采纳 2015-07-18 20:04:19

解决方案2 1 2015-07-18 20:00:47

解决方案3 1 2015-07-18 21:43:51

解决方案1
5 已采纳 2015-07-18 20:04:19

解决方案2
1 2015-07-18 20:00:47

解决方案3
1 2015-07-18 21:43:51