简体   繁体   English

将文本文件转换为 Python 中的 YAML

[英]Converting text file to YAML in Python

I have a text file to convert to YAML format.我有一个要转换为 YAML 格式的文本文件。 Here are some notes to describe the problem a little better:这里有一些注释可以更好地描述这个问题:

  • The sections within the file have a different number of subheadings to each other.文件中的各个部分具有彼此不同数量的子标题。
  • The values of the subheadings can be any data type (eg string, bool, int, double, datetime).子标题的值可以是任何数据类型(例如字符串、布尔值、整数、双精度、日期时间)。
  • The file is approximately 2,000 lines long.该文件大约有 2,000 行长。

An example of the format is below:格式示例如下:

file_content = '''
    Section section_1
        section_1_subheading1 = text
        section_1_subheading2 = bool
    end
    Section section_2
       section_2_subheading3 = int
       section_2_subheading4 = double
       section_2_subheading5 = bool
       section_2_subheading6 = text
       section_2_subheading7 = datetime
    end
    Section section_3
       section_3_subheading8 = numeric
       section_3_subheading9 = int
    end
'''

I have tried to convert the text to YAML format by:我尝试通过以下方式将文本转换为 YAML 格式:

  1. Replacing the equal signs with colons using regex.使用正则表达式用冒号替换等号。
  2. Replacing Section section_name with section_name: .Section section_name替换为section_name:
  3. Removing end between each section.去除每个部分之间的end

However, I am having difficulty with #2 and #3.但是,我对#2 和#3 有困难。 This is the text-to-YAML function I have created so far:这是迄今为止我创建的文本到 YAML function:

import yaml
import re

def convert_txt_to_yaml(file_content):
    """Converts a text file to a YAML file"""

    # Replace "=" with ":"
    file_content2 = file_content.replace("=", ":")

    # Split the lines 
    lines = file_content2.splitlines()

    # Define section headings to find and replace
    section_names = "Section "
    section_headings = r"(?<=Section )(.*)$"
    section_colons = r"\1 : "
    end_names = "end"

    # Convert to YAML format, line-by-line
    for line in lines:
        add_colon = re.sub(section_headings, section_colons, line) # Add colon to end of section name
        remove_section_word = re.sub(section_names, "", add_colon) # Remove "Section " in section header
        line = re.sub(end_names, "", remove_section_word)          # Remove "end" between sections

    # Join lines back together
    converted_file = "\n".join(lines)
    return converted_file

I believe the problem is within the for loop - I can't manage to figure out why the section headers and endings aren't changing.我相信问题出在for循环中 - 我无法弄清楚为什么部分标题和结尾没有改变。 It prints perfectly if I test it, but the lines themselves aren't saving.如果我对其进行测试,它会完美打印,但线条本身并没有保存。

The output format I am looking for is the following:我正在寻找的 output 格式如下:

file_content = '''
    section_1 :
        section_1_subheading1 : text
        section_1_subheading2 : bool
    section_2 :
        section_2_subheading3 : int
        section_2_subheading4 : double
        section_2_subheading5 : bool
        section_2_subheading6 : text
        section_2_subheading7 : datetime
    section_3 :
        section_3_subheading8 : numeric
        section_3_subheading9 : int
'''

I would rather convert it to dict and then format it as yaml using the yaml package in python as below:我宁愿将其转换为 dict 然后使用 yaml package 将其格式化为 yaml yaml

import yaml
def convert_txt_to_yaml(file_content):
    """Converts a text file to a YAML file"""
    config_dict = {}
    
    # Split the lines 
    lines = file_content.splitlines()
    section_title=None
    for line in lines:
        if line=='\n':
            continue
        elif re.match('.*end$', line):
            #End of section
            section_title=None
        elif re.match('.*Section\s+.*', line):
            #Start of Section
            match_obj =  re.match(".*Section\s+(.*)", line)
            section_title=match_obj.groups()[0]
            config_dict[section_title] = {}
        elif section_title and re.match(".*{}_.*\s+=.*".format(section_title), line):
            match_obj =  re.match(".*{}_(.*)\s+=(.*)".format(section_title), line)            
            config_dict[section_title][match_obj.groups()[0]] = match_obj.groups()[1]
    return yaml.dump(config_dict )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM