简体   繁体   中英

How to parse specific block of text file and export json format by Python

I tried to use the following python to parse sample file(sample.txt). But the result is unexpected.

sample:

# Summary Report #######################

System time | 2020-02-27 15:35:32 UTC (local TZ: UTC +0000)
# Instances ##################################################
  Port  Data Directory             Nice OOM Socket
  ===== ========================== ==== === ======
                                   0    0
# Configuration File #########################################
              Config File | /etc/srv.cnf
[mysqld]
server_id            = 1
port                                = 3016
tmpdir                              = /tmp
performance_schema_instrument       = '%=on'
innodb_monitor_enable               = 'module_adaptive_hash'
innodb_monitor_enable               = 'module_buffer'

[client]
port                                = 3016

# management library ##################################
jemalloc is not enabled in mysql config for process with id 2425
# The End ####################################################

code.py

import json
import re

all_lines = open('sample.txt', 'r').readlines()

final_dict = {}
regex = r"^([a-zA-Z]+)(.)+="

config = 0 # not yet found config
for line in all_lines:
    if '[mysqld]' in line:
        final_dict['mysqld'] = {}
        config = 1
        continue
    if '[client]' in line:
        final_dict['client'] = {}
        config = 2
        continue

    if config == 1 and re.search(regex, line):
        try:
            clean_line = line.strip() # get rid of empty space
            k = clean_line.split('=')[0].rstrip() # get the key
            v = clean_line.split('=')[1].lstrip()
            final_dict['mysqld'][k] = v
        except Exception as e:
            print(clean_line, e)

    if config == 2 and re.search(regex, line):
        try:
            clean_line = line.strip() # get rid of empty space
            k = clean_line.split('=')[0].rstrip() # get the key
            v = clean_line.split('=')[1].lstrip()
            final_dict['client'][k] = v
        except Exception as e:
            print(clean_line, e)

print(final_dict)
print(json.dumps(final_dict, indent=4))

with open('my.json', 'w') as f:
    json.dump(final_dict, f, sort_keys=True)

The unexpected result:

{ "client": { "port": "3016" }, "mysqld": { "performance_schema_instrument": "'%", "server_id": "1", "innodb_monitor_enable": "'module_buffer'", "port": "3016", "tmpdir": "/tmp" } }

The expected result:

{
    "client": {
        "port": "3016"
    }, 
    "mysqld": {
        "performance_schema_instrument": "'%=on'", 
        "server_id": "1", 
        "innodb_monitor_enable": "'module_buffer','module_adaptive_hash'", 
        "port": "3016", 
        "tmpdir": "/tmp"
    }
}

Is is possible to achieve the above result?

The configparser is used to handle configuration file settings in python.

import configparser, re, json

regex_string         = '# Configuration File #.*?\n(\[.*?)# management library #'
configuration_string = re.findall(regex_string,open('temp').read(),re.DOTALL)[0]

c = configparser.RawConfigParser(strict=False)
c.read_string(configuration_string)

settings = {k:dict(v) for k,v in c.items() if k!='DEFAULT'}
json.dump(settings,open('temp.json','w'),sort_keys=True,indent=4)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM