簡體   English   中英

將平面制表符分隔的文件轉換為Json嵌套結構

[英]Convert a flat tab-delimited file into Json nested structure

我需要將以下格式的平面文件轉換為JSON格式。 輸入和輸出如下所示。 我遇到了這個問題: 從CSV帖子創建嵌套的JSON,但是,我還有一個額外的信息/字段level ,用於確定JSON輸出中的嵌套結構。 Python pandas確實具有df.to_json但找不到找到所需輸出格式的方法。 任何幫助將不勝感激。

輸入:

name    level   children    size
aaa 7   aaab    2952
aaa 7   aaac    251
aaa 7   aaad    222
aaab    8   xxx 45
aaab    8   xxy 29
aaab    8   xxz 28
aaab    8   xxa 4
aaac    8   ddd 7
aaac    8   xxt 4
aaac    8   xxu 1
aaac    8   xxv 1
ddd 9   ppp 4
ddd 9   qqq 2

輸出:

{
 "name": "aaa",
 "size": 5000,
 "children":
    [
        {
        "name": "aaab",
        "size": 2952,
        "children": [
                  {"name": "xxx", "size": 45},
                  {"name": "xxy", "size": 29},
                  {"name": "xxz", "size": 28},
                  {"name": "xxa", "size": 4}
                  ]
        },

        {
        "name": "aaac",
        "size": 251,
        "children": [
                        {
                        "name": "ddd",
                        "size": 7,
                        "children": [
                                     {"name": "ppp", "size": 4},
                                     {"name": "qqq", "size": 2}
                                     ]
                        },
                        {"name": "xxt", "size": 4},
                        {"name": "xxu", "size": 1},
                        {"name": "xxv", "size": 1}
                     ]
        },
        {"name": "aaad","size": 222}
     ]
}

使用兩遍方法很容易做到這一點:首先,為每條單獨的線構造一個節點。 然后,將每個節點連接到其子節點。

with open("data.txt") as file:
    lines = file.read().split("\n")

#remove header line.
lines = lines[1:]

entries = {}

#create an entry for each child node.
for line in lines:
    name, level, child, size = line.split()
    entries[child] = {"name": child, "size": int(size), "children": []}

#we now have an entry for all nodes that are a child of another node.
#but not for the topmost parent node, so we'll make one for it now.
parents  = set(line.split()[0] for line in lines)
children = set(line.split()[2] for line in lines)
top_parent = (parents - children).pop()
#(just guess the size, since it isn't supplied in the file)
entries[top_parent] = {"name": top_parent, "size": 5000, "children": []}

#hook up each entry to its children
for line in lines:
    name, level, child, size = line.split()
    entries[name]["children"].append(entries[child])

#the nested structure is ready to use!
structure = entries[top_parent]

#display the beautiful result
import pprint
pprint.pprint(structure)

結果:

{'children': [{'children': [{'children': [], 'name': 'xxx', 'size': 45},
                            {'children': [], 'name': 'xxy', 'size': 29},
                            {'children': [], 'name': 'xxz', 'size': 28},
                            {'children': [], 'name': 'xxa', 'size': 4}],
               'name': 'aaab',
               'size': 2952},
              {'children': [{'children': [{'children': [],
                                           'name': 'ppp',
                                           'size': 4},
                                          {'children': [],
                                           'name': 'qqq',
                                           'size': 2}],
                             'name': 'ddd',
                             'size': 7},
                            {'children': [], 'name': 'xxt', 'size': 4},
                            {'children': [], 'name': 'xxu', 'size': 1},
                            {'children': [], 'name': 'xxv', 'size': 1}],
               'name': 'aaac',
               'size': 251},
              {'children': [], 'name': 'aaad', 'size': 222}],
 'name': 'aaa',
 'size': 5000}

編輯:您可以使用del語句從葉節點中刪除children屬性。

#execute this after the "hook up each entry to its children" section.
#remove "children" from leaf nodes.
for entry in entries.itervalues():
    if not entry["children"]:
        del entry["children"]

結果:

{'children': [{'children': [{'name': 'xxx', 'size': 45},
                            {'name': 'xxy', 'size': 29},
                            {'name': 'xxz', 'size': 28},
                            {'name': 'xxa', 'size': 4}],
               'name': 'aaab',
               'size': 2952},
              {'children': [{'children': [{'name': 'ppp', 'size': 4},
                                          {'name': 'qqq', 'size': 2}],
                             'name': 'ddd',
                             'size': 7},
                            {'name': 'xxt', 'size': 4},
                            {'name': 'xxu', 'size': 1},
                            {'name': 'xxv', 'size': 1}],
               'name': 'aaac',
               'size': 251},
              {'name': 'aaad', 'size': 222}],
 'name': 'aaa',
 'size': 5000}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM