繁体   English   中英

合并多个JSON文件,解析成CSV

[英]Combine multiple JSON files, and parse into CSV

我有大约 100 个 JSON 文件,所有文件的标题都不同,我需要将它们合并到一个 CSV 文件中,该文件具有标题“日期”、“实名”、“文本”。

JSON 本身没有列出日期,并且 real_name 是嵌套的。 我有一段时间没有使用 JSON 了,有点迷失了。

JSON 的基本结构看起来或多或少是这样的:

文件名:2021-01-18.json

[
    {
        "client_msg_id": "xxxx",
        "type": "message",
        "text": "THIS IS THE TEXT I WANT TO PULL",
        "user": "XXX",
        "user_profile": {
            "first_name": "XXX",
            "real_name": "THIS IS THE NAME I WANT TO PULL",
            "display_name": "XXX",
            "is_restricted": false,
            "is_ultra_restricted": false
        },
        "blocks": [
            {
                "type": "rich_text",
                "block_id": "yf=A9",
            }
        ]
    }
]

到目前为止我有

import glob 
read_files = glob.glob("*.json")
output_list = []
all_items = []

for f in read_files:
    with open(f, "rb") as infile:
        output_list.append(json.load(infile))
    data = {}
    for obj in output_list[]
        data['date'] = f
        data['text'] = 'text'
        data['real_name'] = 'real_name'
        all_items.append(data)

阅读 JSON object 后,只需索引数据字典即可。 如果您的 JSON 数据确实在每个文件的列表中,您可能需要obj[0]['text']等,但这看起来很奇怪,我假设您的数据是在您收集后从output_list粘贴的数据。 因此,假设您的文件内容完全如下:

{
    "client_msg_id": "xxxx",
    "type": "message",
    "text": "THIS IS THE TEXT I WANT TO PULL",
    "user": "XXX",
    "user_profile": {
        "first_name": "XXX",
        "real_name": "THIS IS THE NAME I WANT TO PULL",
        "display_name": "XXX",
        "is_restricted": false,
        "is_ultra_restricted": false
    },
    "blocks": [
        {
            "type": "rich_text",
            "block_id": "yf=A9",
        }
    ]
}

测试.py:

import json
import glob 
from pathlib import Path

read_files = glob.glob("*.json")
output_list = []
all_items = []
for f in read_files:
    with open(f, "rb") as infile:
        output_list.append(json.load(infile))
    data = {}
    for obj in output_list:
        data['date'] = Path(f).stem
        data['text'] = obj['text']
        data['real_name'] = obj['user_profile']['real_name']
        all_items.append(data)
print(all_items)

Output:

[{'date': '2021-01-18', 'text': 'THIS IS THE TEXT I WANT TO PULL', 'real_name': 'THIS IS THE NAME I WANT TO PULL'}]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM