简体   繁体   中英

Converting text file to YAML in Python

I have a text file to convert to YAML format. Here are some notes to describe the problem a little better:

  • The sections within the file have a different number of subheadings to each other.
  • The values of the subheadings can be any data type (eg string, bool, int, double, datetime).
  • The file is approximately 2,000 lines long.

An example of the format is below:

file_content = '''
    Section section_1
        section_1_subheading1 = text
        section_1_subheading2 = bool
    end
    Section section_2
       section_2_subheading3 = int
       section_2_subheading4 = double
       section_2_subheading5 = bool
       section_2_subheading6 = text
       section_2_subheading7 = datetime
    end
    Section section_3
       section_3_subheading8 = numeric
       section_3_subheading9 = int
    end
'''

I have tried to convert the text to YAML format by:

  1. Replacing the equal signs with colons using regex.
  2. Replacing Section section_name with section_name: .
  3. Removing end between each section.

However, I am having difficulty with #2 and #3. This is the text-to-YAML function I have created so far:

import yaml
import re

def convert_txt_to_yaml(file_content):
    """Converts a text file to a YAML file"""

    # Replace "=" with ":"
    file_content2 = file_content.replace("=", ":")

    # Split the lines 
    lines = file_content2.splitlines()

    # Define section headings to find and replace
    section_names = "Section "
    section_headings = r"(?<=Section )(.*)$"
    section_colons = r"\1 : "
    end_names = "end"

    # Convert to YAML format, line-by-line
    for line in lines:
        add_colon = re.sub(section_headings, section_colons, line) # Add colon to end of section name
        remove_section_word = re.sub(section_names, "", add_colon) # Remove "Section " in section header
        line = re.sub(end_names, "", remove_section_word)          # Remove "end" between sections

    # Join lines back together
    converted_file = "\n".join(lines)
    return converted_file

I believe the problem is within the for loop - I can't manage to figure out why the section headers and endings aren't changing. It prints perfectly if I test it, but the lines themselves aren't saving.

The output format I am looking for is the following:

file_content = '''
    section_1 :
        section_1_subheading1 : text
        section_1_subheading2 : bool
    section_2 :
        section_2_subheading3 : int
        section_2_subheading4 : double
        section_2_subheading5 : bool
        section_2_subheading6 : text
        section_2_subheading7 : datetime
    section_3 :
        section_3_subheading8 : numeric
        section_3_subheading9 : int
'''

I would rather convert it to dict and then format it as yaml using the yaml package in python as below:

import yaml
def convert_txt_to_yaml(file_content):
    """Converts a text file to a YAML file"""
    config_dict = {}
    
    # Split the lines 
    lines = file_content.splitlines()
    section_title=None
    for line in lines:
        if line=='\n':
            continue
        elif re.match('.*end$', line):
            #End of section
            section_title=None
        elif re.match('.*Section\s+.*', line):
            #Start of Section
            match_obj =  re.match(".*Section\s+(.*)", line)
            section_title=match_obj.groups()[0]
            config_dict[section_title] = {}
        elif section_title and re.match(".*{}_.*\s+=.*".format(section_title), line):
            match_obj =  re.match(".*{}_(.*)\s+=(.*)".format(section_title), line)            
            config_dict[section_title][match_obj.groups()[0]] = match_obj.groups()[1]
    return yaml.dump(config_dict )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM