Python模拟strptime行为

Question

I have a python program that takes file from many sources, all files from the same source have the same format, but the formats vary greatly. 我有一个python程序，可以从许多来源获取文件，来自同一来源的所有文件都具有相同的格式，但是格式差异很大。 One source can be in the format ServerName - ProccessID - Date another could be (Date)_Username_ProccessID_Server . 一个来源的格式可以为ServerName-ProccessID-日期，另一个来源可以为（Date）_Username_ProccessID_Server 。 Currently, to add a new source, with a new format requires a coder to write a parse function for each source. 当前，要添加具有新格式的新源，需要编码器为每个源编写一个解析函数。

I've started writing a new adapter, and i'd like to store the file format as a string. 我已经开始编写新的适配器，并且我想将文件格式存储为字符串。 the like first one would be %S - %P - %D, the second could be like (%D) %U %P_%S. 例如第一个是％S-％P-％D，第二个可能是（％D）％U ％P_％S。

What would the best approach for this in python3? python3中最好的方法是什么？

Answer 1

Something like this would be reasonable: 这样的事情是合理的：

import re
from collections import namedtuple

Format = namedtuple('Format', 'name format_string regex')
class Parser(object):
    replacements = [Format('server', '%S', r'[A-Za-z0-9]+'),
                    Format('user', '%U', r'[A-Za-z0-9]+'),
                    Format('date', '%D', r'[0-9]{4}-[0-9]{2}-[0-9]{2}'),
                    Format('process_id', '%P', r'[0-9]+'),
                    ]

    def __init__(self, format):
        self.format = format
        self.re = re.compile(self._create_regex(format))

    def _create_regex(self, format):
        format = re.escape(format)
        for replacement in self.replacements:
            format = format.replace(r'\%s' % replacement.format_string,
                                    r'(?P<%s>%s)' % (replacement.name,
                                                     replacement.regex,
                                                     ),
                                    )
        return format

    def parse(self, data):
        match = self.re.match(data)
        if match:
            return match.groupdict()
        return None

Usage: 用法：

a_parser = Parser("(%D)%U_%P_%S")
print a_parser.parse("(2005-04-12)Jamie_123_Server1")

b_parser = Parser("%S - %P - %D")
print b_parser.parse("Server1 - 123 - 2005-04-12")

Output: 输出：

{'date': '2005-04-12', 'process_id': '123', 'user': 'Jamie', 'server': 'Server1'}
{'date': '2005-04-12', 'process_id': '123', 'server': 'Server1'}

Essentially, I'm creating a mapping between the %? 本质上，我正在创建%?之间的映射%? s in your custom format syntax and a predefined regular expression to match that parameter, then replacing the %? 以您的自定义格式语法和预定义的正则表达式匹配该参数，然后替换%? strings in the given format string with the corresponding regex to build a parser for that pattern. 给定格式的字符串以及相应的正则表达式，以为该模式构建解析器。

This will only work if the characters that delimit a "type" in the format string don't appear in it's regex, or if there's no delimiter, then that the two regex's that are side-by-side don't "interfere" with each other. 仅当在格式字符串中定界“类型”的字符未出现在其正则表达式中时，或者如果没有定界符，则两个并排的正则表达式不会“干扰”彼此。 For example, with the format string: 例如，使用格式字符串：

%U%P

And the regexs I've assigned to user and process_id above, it's impossible tell where user ends and process_id starts in this string: 以及我在上面分配给user和process_id则表达式，不可能在此字符串中告诉user结束位置和process_id ：

User1234

Is that User1 and 234 or User and 1234 , or any other combination? 是User1和234还是User和1234 ，或其他组合？ But then, even a human can't work that out! 但是，即使是人类，也无法解决！

Python模拟strptime行为

问题描述

1 个解决方案

解决方案1
2 已采纳 2014-07-14 14:15:28

Python模拟strptime行为

问题描述

1 个解决方案

解决方案1 2 已采纳 2014-07-14 14:15:28

解决方案1
2 已采纳 2014-07-14 14:15:28