简体   繁体   English

Python-将文本文件转换为dict并转换为json

[英]Python - convert text file to dict and convert to json

How can I convert this text file to json? 如何将该文本文件转换为json? Ultimately, I'll be inserting the json blobs into a NoSQL database, but for now I plan to parse the text files and build a python dict, then dump to json. 最终,我将把json blob插入到NoSQL数据库中,但是现在我打算解析文本文件并构建一个python dict,然后转储到json。

I think there has to be a way to do this with a dict comprehension that I'm just not seeing/following (I'm new to python). 我认为必须有一种方法来解决我对dict的理解,我只是没有看到/关注(我是python的新手)。

Example of a file: 文件示例:

file_1.txt
[namespace1] => metric_A = value1
[namespace1] => metric_B = value2
[namespace2] => metric_A = value3
[namespace2] => metric_B = value4
[namespace2] => metric_B = value5

Example of dict I want to build to convert to json: 我要构建以转换为json的dict示例:

{  "file1" : {
             "namespace1" : {
                 "metric_A" : "value_1",
                 "metric_B" : "value_2"     
             },
             "namespace2" : {
                 "metric_A" : "value_3",
                 "metric_B" : ["value4", "value5"]
             }
}

I currently have this working, but my code is a total mess (and much more complex than this example w/ clean up etc). 目前,我正在执行此操作,但是我的代码是一团糟(比这个示例(带清理)复杂得多)。 I'm basically going line by line through the file, building a python dict. 我基本上是逐行浏览文件,构建一个python dict。 I check each namespace for existence in the dict, if it exists, i check the metric. 我检查字典中是否存在每个命名空间,如果存在,则检查指标。 If the metric exists already, I know I have duplicates and need to convert the value to an array that contains the existing value and my new value(s). 如果指标已经存在,我知道我有重复项,需要将值转换为包含现有值和新值的数组。 There has to be a more simple/clean way. 必须有一个更简单/干净的方法。

import glob
import json

answer = {}
for fname in glob.glob(file_*.txt):  # loop over all filenames
    answer[fname] = {}
    with open(fname) as infile:
        for line in infile:
            line = line.strip()
            if not line: continue
            splits = line.split()[::2]
            splits[0] = splits[0][1:-1]
            namespace, metric, value = splits  # all the values in the line that we're interested in
            answer[fname].get(namespace, {})[metric] = value  # populate the dict

required_json = json.dumps(answer)  # turn the dict into proper JSON

You can use regex for that. 您可以为此使用正则表达式。 re.findall('\\w+', line) will find all text groups which you are after, then the rest is saving it in the dictionary of dictionary. re.findall('\\w+', line)将找到您想要的所有文本组,然后其余的将其保存在字典的字典中。 The simplest way to do that is to use defaultdict from collections . 最简单的方法是使用collections defaultdict

import re

from collections import defaultdict

answer = defaultdict(lambda: defaultdict(lambda: []))

with open('file_1.txt', 'r') as f:
    for line in f:
        namespace, metric, value = re.findall(r'\w+', line)
        answer[namespace][metric].append(value)

As we know, that we expect exactly 3 alphanum groups, we assign it to 3 variable, ie namespace, metric, value . 众所周知,我们期望恰好有3个字母数字组,我们将其分配给3个变量,即namespace, metric, value Finally, defaultdict will return defaultdict for the case when we see namespace first time, and the inner defaultdict will return an empty array for first append, making code more compact. 最后,对于第一次看到名称空间的情况, defaultdict将返回defaultdict,而内部defaultdict将为首次追加返回一个空数组,从而使代码更紧凑。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM