简体   繁体   English

ConfigParser 慢吗? ConfigParser 与丑陋的算法

[英]Is ConfigParser slow? ConfigParser vs. an ugly algorithm

I was trying to improve a function I created to parse a bunch of INI files into a single JSON file.我试图改进我创建的 function 以将一堆 INI 文件解析为单个 JSON 文件。 When I wrote this function I was a newbie and I didn't know configparser module, so now I want to exploit it.当我写这个 function 时我是一个新手,我不知道configparser模块,所以现在我想利用它。

INPUT : INI file (parent) containing a long set of references to other INI files (children). INPUT :INI 文件(父)包含对其他 INI 文件(子)的一长串引用。

GOAL : to convert the INI parent file to JSON and include some info taken from all the children files into it.目标:将 INI 父文件转换为 JSON 并包含从所有子文件中获取的一些信息。 In other words: take some info from a set of INI files and export them into a single JSON file.换句话说:从一组 INI 文件中获取一些信息并将它们导出到单个 JSON 文件中。

Question : I expected my new code to be at least as fast as the old one, but it isn't: it is 2 times slower.问题:我希望我的新代码至少和旧代码一样快,但事实并非如此:它慢了 2 倍。 Why is that?这是为什么? Is it ConfigParser or is it me?是 ConfigParser 还是我? Performance are really important and my function takes one second to parse around 900 INI files, while the old one takes half a second.性能真的很重要,我的 function 需要一秒钟来解析大约 900 个 INI 文件,而旧的需要半秒钟。

Parent Example父示例

(it can have from hundreds of lines to tens of thousands ): (它可以有数百行到数万行):

[General]
Name = parent
...
   
[Item 000001]
Name = first item
path = "path/to/child_1.ini"
...

[Item 000002]
Name = second item
...

[...]

[Item 001000]
Name = thousandth item
...   

Child Example子示例

(it can have from less than 100 lines to about 200): (它可以有少于 100 行到大约 200 行):

[General]
Name = name
ID = 12345
...

[Options]
...

JSON Output Example JSON Output 示例

{
    "Parent": {
        "Name": "parent",
        "Count": "1000",
        [...]
        "child1": {
            "Name": "name",
            "ID": "12345",
            "Option1": "...",
            "Option2": "...",
            "Option3": "..." 
        },
        "child2": {
            "Name": "name2",
            "ID": "22222",
            "Option1": "...",
            "Option2": "...",
            "Option3": "..." 
        },
        [...]
        "child1000": {
            "Name": "name1000",
            "ID": "12332",
            "Option1": "...",
            "Option2": "...",
            "Option3": "..." 
        }
    }
}

OLD CODE旧代码

def split_string_by_equal(string):
    str_operands = string.split(' = ')
    first_part = (str_operands[0]).strip()
    second_part = (' '.join(str_operands[1:])).strip()
    return [first_part, second_part]

def parse_ini_to_json(path):
    parent_dict = {}
    child_dict = {}
    num_child = 1
    parent_directory = os.path.dirname(testflow)
    with open(path, 'r') as parent_file:
        for line in tfl_file:
            left_part = split_string_by_equal(line)[0]
            right_part = split_string_by_equal(line)[1]
            if left_part in SOME_WORDS:
                parent_dict.update({left_part: do_something(parent_directory, right_part)})
            elif left_part == 'Count':
                parent_dict.update({'Count': right_part})
            elif left_part == 'JohnDoe':
                parent_dict['JohnDoe'] = right_part
            elif 'Item' in line:
                if child_dict:
                    parent_dict.update({'test{}'.format(num_child): child_dict})
                    child_dict = {}
                    num_child += 1
            elif left_part in SOME_OTHER_WORDS:
                child_dict.update({left_part: right_part})
            if left_part == 'path':
                child_dict.update(extract_data_from_child(right_part))
    if child_dict:
        parent_dict.update({'child{}'.format(num_test): child_dict})
    return parent_dict

def extract_data_from_child(path):
    """ same methodology used in above function """
    [...]
    return child_dict

NEW CODE新代码

def get_config_parser(path):
    config = configparser.ConfigParser()
    config.optionxform = str
    config.read(path)
    return config

def parse_ini_to_json(path):
    config = get_config_parser(path)
    parent_directory = os.path.dirname(testflow)
    parent_dict = {}
    for key in config['Folders'].keys():
        parent_dict[key] = do_something( parent_directory, config['Folders'][key])
    parent_dict['Count'] = config['General']['Count']
    parent_dict['JohnDoe'] = config['General']['JohnDoe']
    counter = 1
    for key in config.keys():
        if 'Item' in key:
            child_dict = {}
            for child_prop in config[key].keys():
                if child_prop in SOME_WORDS:
                    child_dict[child_prop] = config[key][child_prop]
            child_path = config[key]['path']
            child_dict.update(extract_data_from_child(child_path))
            child_dict[f'child{counter}'] = child_dict
            counter += 1
    return parent_dict


def extract_data_from_child(path):
    config = sysfunc.get_config_parser(path)
    child_dict = {}
    for key in config['General'].keys():
        if key in SOME_KEYWORDS:
            child_dict[key] = config['General'][key]
    for key in config['Levels'].keys():
        if key in SOME_OTHER_KEYWORDS:
            child_dict[key] = config['Options'][key]
    try:
        some_value = config['Levels']['SomeKey']
    except KeyError:
        pass
    for key in config['Options'].keys():
        value = config['Options'][key]
        key_enabled = key.strip() + 'enabled'
        try:
            if config['Options'][key_enabled] == '0':
                continue
        except KeyError:
            continue
        if 'false' in value:
            value = '0'
        elif 'true' in value:
            value = '1'
        child_dict[key] = value
    return child_dict 

I expected my new code to be at least as fast as the old one, but it isn't: it is 2 times slower.我希望我的新代码至少和旧代码一样快,但事实并非如此:它慢了 2 倍。 Why is that?这是为什么? Is it ConfigParser or is it me?是 ConfigParser 还是我?

It's both.两者都是。 To understand where time is being spent in your code and in ConfigParser, you should look at using a Profiler .要了解在您的代码和 ConfigParser 中花费的时间,您应该考虑使用Profiler https://docs.python.org/3/library/profile.html is a good starting point for learning how to profile code. https://docs.python.org/3/library/profile.html是学习如何分析代码的一个很好的起点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM