简体   繁体   English

将 Python 字典对象转换为 JSON 但将 JSON 嵌套在父数组中

[英]Convert a Python Dictionary Objects to JSON but nest the JSON in a Parent Array

The script below parses the content of some markdown files in a directory.下面的脚本解析目录中一些 markdown 文件的内容。 It extracts the separate components of each file and places them into a dictionary, and then converts the dictionary to JSON.它提取每个文件的单独组件并将它们放入字典中,然后将字典转换为 JSON。

from datetime import datetime
import glob
import json
import os
import re


def entries():
    entries = glob.glob("/home/user/temp/wiki/" + "*.md")
    regexp = r"^\s*(?:-{3})(.*?)(?:-{3})\s*(.+)$"

    for entry in entries:
        with open(entry, "r", encoding="utf-8") as file:
            file_content = file.read()

            try:
                # Regular expression to use:
                match = re.compile(regexp, re.S | re.M)

                # Find matches:
                result = match.search(file_content)

                # Convert frontmatter into dictionary:
                frontmatter = dict(re.findall(r"(.*): (.*)", result.group(1)))

                # Convert individual tags to list items:
                frontmatter["tags"] = frontmatter["tags"][1:-1].split(",")

                # Add content to dict:
                frontmatter["content"] = result.group(2)

                # Create JSON object:
                search_index = json.dumps(frontmatter, indent=4, default=str)

                print(search_index)

            except:
                print(f"Error: No YAML frontmatter found in '{entry_path}'")


entries()

When the script is run, it returns the below output:当脚本运行时,它返回以下 output:

{
    "id": "20210131141200",
    "title": "Nulla id feugiat mauris.",
    "date": "2021-01-31 14:12:00",
    "tags": [
        "nulla",
        " id"
    ],
    "content": "Fusce eu pulvinar velit. Praesent vel velit quis risus euismod pulvinar. Vestibulum nisl sapien, scelerisque vitae ornare ut, feugiat at tellus. Sed scelerisque tellus molestie, rhoncus neque eu, condimentum eros. Quisque sapien tellus, volutpat a gravida quis, iaculis et erat. \n\nQuisque porttitor euismod odio ut eleifend. In semper sagittis cursus. Donec iaculis blandit fringilla. Donec lobortis lectus orci, gravida blandit risus fermentum vitae. \n"
}
{
    "id": "20210202144523",
    "title": "Lorem ipsum dolor sit amet",
    "date": "2021-02-02 14:45:23",
    "tags": [
        "lorem",
        " ipsum"
    ],
    "content": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi posuere turpis et mattis vehicula. Quisque consectetur purus auctor varius sagittis. Mauris cursus turpis ac massa luctus bibendum. \n\nDonec eu varius justo. Aliquam vel rutrum urna, in pellentesque mauris. Nulla eu mollis turpis. Duis lacinia laoreet tortor eget laoreet. Aenean vel velit lacus. Duis mollis eros sed ex cursus auctor. Quisque tristique metus nec ex sodales malesuada. In pharetra bibendum turpis vel auctor."
}
{
    "id": "20210201132608",
    "title": "Sed vulputate arcu eu iaculis auctor",
    "date": "2021-02-01 13:26:08",
    "tags": [
        "sed",
        " vulputate",
        " arcu",
        " eu"
    ],
    "content": "Proin ullamcorper massa enim, vel dignissim dui tempus at. Pellentesque nec metus quis massa sodales tempor. Fusce mauris lectus, hendrerit et rhoncus sit amet, aliquam non arcu. Aenean et velit sit amet neque malesuada consequat eu scelerisque magna. Aliquam varius maximus dolor non ullamcorper. Nullam interdum sed dolor eu iaculis.\n\nDuis vel cursus velit. Sed interdum massa nunc, vel aliquam magna placerat in. Vestibulum egestas magna ligula, ut fringilla erat faucibus eu. Phasellus luctus laoreet velit, et imperdiet magna tincidunt et. Nullam vitae diam at arcu faucibus consectetur commodo suscipit magna. Curabitur rhoncus in elit vitae vestibulum. Fusce luctus mattis fringilla. Curabitur feugiat tristique odio. \n"
}

This isn't quite in the format I need it to be.这不是我需要的格式。 I'm trying to output the JSON exactly as you can see it below, but I'm not having much luck.我正在尝试 output JSON,正如您在下面看到的那样,但我运气不佳。

{
    "entries": [
        {
            "id": "20210131141200",
            "title": "Nulla id feugiat mauris.",
            "date": "2021-01-31 14:12:00",
            "tags": [
                "nulla",
                " id"
            ],
            "content": "Fusce eu pulvinar velit. Praesent vel velit quis risus euismod pulvinar. Vestibulum nisl sapien, scelerisque vitae ornare ut, feugiat at tellus. Sed scelerisque tellus molestie, rhoncus neque eu, condimentum eros. Quisque sapien tellus, volutpat a gravida quis, iaculis et erat. \n\nQuisque porttitor euismod odio ut eleifend. In semper sagittis cursus. Donec iaculis blandit fringilla. Donec lobortis lectus orci, gravida blandit risus fermentum vitae. \n"
        },
        {
            "id": "20210202144523",
            "title": "Lorem ipsum dolor sit amet",
            "date": "2021-02-02 14:45:23",
            "tags": [
                "lorem",
                " ipsum"
            ],
            "content": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi posuere turpis et mattis vehicula. Quisque consectetur purus auctor varius sagittis. Mauris cursus turpis ac massa luctus bibendum. \n\nDonec eu varius justo. Aliquam vel rutrum urna, in pellentesque mauris. Nulla eu mollis turpis. Duis lacinia laoreet tortor eget laoreet. Aenean vel velit lacus. Duis mollis eros sed ex cursus auctor. Quisque tristique metus nec ex sodales malesuada. In pharetra bibendum turpis vel auctor."
        },
        {
            "id": "20210201132608",
            "title": "Sed vulputate arcu eu iaculis auctor",
            "date": "2021-02-01 13:26:08",
            "tags": [
                "sed",
                " vulputate",
                " arcu",
                " eu"
            ],
            "content": "Proin ullamcorper massa enim, vel dignissim dui tempus at. Pellentesque nec metus quis massa sodales tempor. Fusce mauris lectus, hendrerit et rhoncus sit amet, aliquam non arcu. Aenean et velit sit amet neque malesuada consequat eu scelerisque magna. Aliquam varius maximus dolor non ullamcorper. Nullam interdum sed dolor eu iaculis.\n\nDuis vel cursus velit. Sed interdum massa nunc, vel aliquam magna placerat in. Vestibulum egestas magna ligula, ut fringilla erat faucibus eu. Phasellus luctus laoreet velit, et imperdiet magna tincidunt et. Nullam vitae diam at arcu faucibus consectetur commodo suscipit magna. Curabitur rhoncus in elit vitae vestibulum. Fusce luctus mattis fringilla. Curabitur feugiat tristique odio. \n"
        }
    ]
}

Everything I've tried ( search_index = json.dumps({"entries": frontmatter}, indent=4, default=str) for example) get's close but because it's in a loop, it ends up outputting "entries": each time instead of "wrapping" the objects, as you can see below:我尝试过的所有内容( search_index = json.dumps({"entries": frontmatter}, indent=4, default=str)接近,但由于它处于循环中,因此最终输出"entries":每次而不是“包装”对象,如下所示:

{
    "entries": {
        "id": "20210131141200",
        "title": "Nulla id feugiat mauris.",
        "date": "2021-01-31 14:12:00",
        "tags": [
            "nulla",
            " id"
        ],
        "content": "Fusce eu pulvinar velit. Praesent vel velit quis risus euismod pulvinar. Vestibulum nisl sapien, scelerisque vitae ornare ut, feugiat at tellus. Sed scelerisque tellus molestie, rhoncus neque eu, condimentum eros. Quisque sapien tellus, volutpat a gravida quis, iaculis et erat. \n\nQuisque porttitor euismod odio ut eleifend. In semper sagittis cursus. Donec iaculis blandit fringilla. Donec lobortis lectus orci, gravida blandit risus fermentum vitae. \n"
    }
}
{
    "entries": {
        "id": "20210202144523",
        "title": "Lorem ipsum dolor sit amet",
        "date": "2021-02-02 14:45:23",
        "tags": [
            "lorem",
            " ipsum"
        ],
        "content": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi posuere turpis et mattis vehicula. Quisque consectetur purus auctor varius sagittis. Mauris cursus turpis ac massa luctus bibendum. \n\nDonec eu varius justo. Aliquam vel rutrum urna, in pellentesque mauris. Nulla eu mollis turpis. Duis lacinia laoreet tortor eget laoreet. Aenean vel velit lacus. Duis mollis eros sed ex cursus auctor. Quisque tristique metus nec ex sodales malesuada. In pharetra bibendum turpis vel auctor."
    }
}
{
    "entries": {
        "id": "20210201132608",
        "title": "Sed vulputate arcu eu iaculis auctor",
        "date": "2021-02-01 13:26:08",
        "tags": [
            "sed",
            " vulputate",
            " arcu",
            " eu"
        ],
        "content": "Proin ullamcorper massa enim, vel dignissim dui tempus at. Pellentesque nec metus quis massa sodales tempor. Fusce mauris lectus, hendrerit et rhoncus sit amet, aliquam non arcu. Aenean et velit sit amet neque malesuada consequat eu scelerisque magna. Aliquam varius maximus dolor non ullamcorper. Nullam interdum sed dolor eu iaculis.\n\nDuis vel cursus velit. Sed interdum massa nunc, vel aliquam magna placerat in. Vestibulum egestas magna ligula, ut fringilla erat faucibus eu. Phasellus luctus laoreet velit, et imperdiet magna tincidunt et. Nullam vitae diam at arcu faucibus consectetur commodo suscipit magna. Curabitur rhoncus in elit vitae vestibulum. Fusce luctus mattis fringilla. Curabitur feugiat tristique odio. \n"
    }
}

For reference, the Markdown files are structured as below:作为参考,Markdown 文件的结构如下:

---
id: 20210131141200
title: Nulla id feugiat mauris.
date: 2021-01-31 14:12:00
tags: [nulla, id]
---

Fusce eu pulvinar velit. Praesent vel velit quis risus euismod pulvinar. Vestibulum nisl sapien, scelerisque vitae ornare ut, feugiat at tellus. Sed scelerisque tellus molestie, rhoncus neque eu, condimentum eros. Quisque sapien tellus, volutpat a gravida quis, iaculis et erat. 

Quisque porttitor euismod odio ut eleifend. In semper sagittis cursus. Donec iaculis blandit fringilla. Donec lobortis lectus orci, gravida blandit risus fermentum vitae. 

Rather than creating each element, gather all elements then convert, then to JSON all at once.与其创建每个元素,不如收集所有元素然后转换,然后一次转换为 JSON。

def entries():
    entries = glob.glob("/home/user/temp/wiki/" + "*.md")
    regexp = r"^\s*(?:-{3})(.*?)(?:-{3})\s*(.+)$"

    # Store entries to dump later
    entry_dicts = []

    for entry in entries:
        with open(entry, "r", encoding="utf-8") as file:
            file_content = file.read()

            try:
                ...  # your code as-is here

                # Do not create the JSON object, instead:
                entry_dicts.append(frontmatter)

            except:
                print(f"Error: No YAML frontmatter found in '{entry_path}'")

    entries_json = json.dumps({'entries': entry_dicts}, indent=4, default=str)
    print(entries_json)

entries()

Rather than printing each search_index inside the loop, collect all the results in a single object.与其在循环内打印每个search_index ,不如将所有结果收集到单个 object 中。 Something like:就像是:

def entries():
    results = []

    for entry in entries:
        result = dict()

        # do work

        results.append(result)

    return results


print(entries())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM