简体   繁体   English

Python - 将 JSON 格式的 .txt 文件转换为 python 字典

[英]Python - Convert JSON-Formatted .txt file to python dictionary

I have a .txt file that is structured somewhat like a JSON dictionary tree.我有一个 .txt 文件,其结构有点像 JSON 字典树。 I intend on converting it to JSON using python.我打算使用 python 将其转换为 JSON。 I believe that to do this, I first need to convert the JSON file into a python dictionary, which I am having trouble with.我相信要做到这一点,我首先需要将 JSON 文件转换为我遇到问题的 python 字典。 The structure of the .txt file is shown below. .txt 文件的结构如下所示。 This is not the entire file, and it may continue for awhile like this with nested dictionaries.这不是整个文件,它可能会像嵌套字典一样持续一段时间。

 outer:
     middle:
          identity:        data1                                           
          types:           data2
          name:            data3
          region:          data4
          motion:          data5
          geometry_motion: data6
          roughness:
             height:       data7
             constant:     data8                                         
          velocity:
             types:        data9
             value:        data10

The output should eventually be JSON, but I'm more concerned to getting it to a python dict that looks something like this.输出最终应该是 JSON,但我更关心的是把它变成一个看起来像这样的 python dict。

{'outer': {'middle': {'identity': data1, 'names': data2, etc.}}}

My attempts to solve this so far have involved using the readlines() method to convert the file to a list of its lines, and splitting the lines by the colon using line.split(':') .到目前为止,我解决这个问题的尝试涉及使用readlines()方法将文件转换为其行列表,并使用line.split(':')用冒号分隔行。 The following code shows this.以下代码显示了这一点。

with open(datafile) as file:
    lines = file.readlines()
    lines = [line.strip().split(':', 1) for line in lines]

output:
[['outer:', ''], ['middle:', ''], ['identity:', 'data1'], etc.]

I then tried to iterate over the lines, and if the second element in the line was '', then the first element in that line would become the key for a new dict containing the rest of the items of a further indentation.然后我尝试遍历这些行,如果该行中的第二个元素是 '',那么该行中的第一个元素将成为包含进一步缩进的其余项目的新 dict 的键。 Here is where I have gotten quite stuck.这是我陷入困境的地方。 I have toyed with the idea of using a recursive function that calls itself every time a new nested dict must be made, but I haven't gotten anywhere with that.我玩弄了使用递归函数的想法,每次必须创建一个新的嵌套字典时调用它自己,但我没有得到任何结果。 Here is an attempt at some code which does not work for a number of reasons but may give some insight on my thought process.这是一些代码的尝试,由于多种原因无法正常工作,但可能会对我的思维过程有所了解。

data_dict = {}
i = 0
def recurse(i):
    try: 
        elements = lines[i]
    except IndexError: # return the dict once the list runs out of elements
        return data_dict
    if elements[1] == '':
        i += 1
        data_dict[[elements[0]]] = recurse(i)
    else: # if there is a second element in the list, make those key-value pairs in data_dict
        k, v = [element.strip() for element in elements]
        data_dict[k] = v  
        i += 1
        recurse(i)

Please feel free to provide any advice or suggestions that would send me in the right direction.请随时提供任何建议或建议,让我朝着正确的方向前进。

This is my first question on Stack OverFlow and I understand that there's a chance I could have left out some valuable information.这是我关于 Stack OverFlow 的第一个问题,我知道我有可能遗漏了一些有价值的信息。 Please let me know if there's anything else I can do/provide to help solve this problem.如果我还能做/提供什么来帮助解决这个问题,请告诉我。

This text is valid YAML.此文本是有效的 YAML。 You can use the yaml package to read it from a file or parse the string, and get a Python dictionary.您可以使用yaml包从文件中读取或解析字符串,并获取 Python 字典。 After that, you can use the json module to serialize the dictionary into JSON.之后,您可以使用json模块将字典序列化为 JSON。

import yaml
import json

with open('test.yml', 'r') as yaml_file:
    doc=yaml.load(yaml_file,Loader=yaml.FullLoader)
print(doc)
----------
{'outer': {'middle': {'identity': 'data1', 'types': 'data2', 'name': 'data3', 'region': 'data4', 'motion': 'data5', 'geometry_motion': 'data6', 'roughness': {'height': 'data7', 'constant': 'data8'}, 'velocity': {'types': 'data9', 'value': 'data10'}}}}

That dictionary can be written as JSON with json.dump :该字典可以用json.dump写成 JSON:

with open('test.json', 'w') as json_file:
    json.dump(doc,json_file)

The answer by Panagiotis Kanavos ("use the Python's yaml package") is probably the best. Panagiotis Kanavos 的答案(“使用 Python 的yaml包”)可能是最好的。 Nevertheless, it might be instructive to try solving it without yaml .尽管如此,尝试在没有yaml的情况下解决它可能是有益的。

I think one key problem in your approach is that you ignore the indentation, which means that我认为您的方法中的一个关键问题是您忽略了缩进,这意味着

a:
  b: x
c: y

results in the same list lines as结果与相同的列表lines

a:
  b: x
  c: y

even though they should have the same tree structure.即使它们应该具有相同的树结构。

Another problem is that you do not tell the recursive call, what dictionary the new values should be put into.另一个问题是你没有告诉递归调用,新值应该放入哪个字典。

I tried to build a solution similar to your attempt.我试图建立一个与您的尝试类似的解决方案。 Here it is:这里是:

lines = []
with open('data.txt', 'r') as fp:
    for line in fp:
        left, right = line.split(':')
        indentation = len(left) - len(left.lstrip())
        lines.append((indentation, left.strip(), right.strip()))

def fill_dictionary(dictionary, i, previous_indentation):
    j = i
    while j < len(lines):
        indentation, key, val = lines[j]
        if indentation <= previous_indentation:
            return j   # go one level up
        elif not val:  # go one level deeper
            dictionary[key] = {}
            j = fill_dictionary(dictionary[key], j+1, indentation)
        else:          # just enter the value
            dictionary[key] = val
            j += 1
    return j

result = {}
fill_dictionary(result, 0, -1)
print(result)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM