简体   繁体   English

如何格式化此python正则表达式?

[英]How can I format this python regular expression?

I'm trying to parse data from a text file. 我正在尝试解析文本文件中的数据。 The data tuples are an age, with either 0-3 times following that are 'right' aligned. 数据元组是一个年龄,其后的0到3次是“正确”对齐的。 No matter how many times follow an age in the source data, I want to None "pad" three times. 不管有多少次是紧跟着源数据的时代,我想None “垫”三次。 Ages and times are all space separated, and further to that, times are either of the format "mm:ss.dd" or "ss.dd". 年龄和时间都用空格分隔,更进一步的是,时间的格式为“ mm:ss.dd”或“ ss.dd”。 The age and times can repeat one or more times in a single line. 年龄和时间可以在一行中重复一次或多次。

Here is some example data: 以下是一些示例数据:

test_str = ['25',
    '24 22.10',
    '16 59.35 1:02.44',
    '18 52.78 59.45 1:01.22',
    '33 59.35 1:02.44 34 52.78 59.45 1:01.22 24 25']

Scanned, the above should produce tuples (or list, dicts, ... whatever) 扫描后,上面应该会生成元组(或列表,字典等)

(25, None, None, None)
(24, None, None, 0:22.10)
(16, None, 0:59.35, 1:02.44)
(18, 0:52.78, 0:59.45, 1:01.22)
(33, None, 0:59.35, 1:02.44), (34, 0:52.78, 0:59.45, 1:01.22), (24, None, None, None), (25, None, None)

My thought was to use a regular expression, something along the lines of: 我的想法是使用正则表达式,大致类似于:

data_search = r'[1-9][0-9]( (([1-9][0-9]:)?[0-9]{2}.[0-9]{2})|){3}'
x = re.search(data_search, test_str[0])

But I'm not being successful. 但是我没有成功。

Could somebody help me with the regex or suggest a better solution? 有人可以帮助我使用正则表达式还是建议更好的解决方案?

I'm not sure if this would be the best approach, but this splits off the first element as it is always statically in the first position, and then splits the rest and fills in the gaps with None . 我不确定这是否是最好的方法,但这会拆分第一个元素,因为它始终静态地位于第一个位置,然后拆分其余元素,并用None填充空白。

test_str = ['25',
            '24 22.10',
            '16 59.35 1:02.44',
            '18 52.78 59.45 1:01.22']

def create_tuples(string_list):
    all_tuples = []
    for space_string in string_list:
        if not space_string:
            continue
        split_list = space_string.split()
        first_list_element = split_list[0]
        last_list_elements = split_list[1:]
        all_tuples.append([first_list_element] + [None] * (3 - len(last_list_elements)) + last_list_elements)
    return all_tuples

print(create_tuples(test_str))

# Returns:
[['25', None, None, None], ['24', None, None, '22.10'], ['16', None, '59.35', '1:02.44'], ['18', '52.78', '59.45', '1:01.22']]

I believe this is close to what you want. 我相信这接近您想要的。 Sorry for lacking regex. 很抱歉缺少正则表达式。

def format_str(test_str):
    res = []
    for x in test_str:
        parts = x.split(" ")
        thing = []
        for part in parts:
            if len(thing) != 0 and '.' not in part and ':' not in part:
                res.append(thing[:1] + [None]*(4-len(thing)) + thing[1:])
                thing = [part]
            else:
                thing.append(part)
        if len(thing) != 0:
            res.append(thing[:1] + [None]*(4-len(thing)) + thing[1:])
    return res

test_str = ['25',
    '24 22.10',
    '16 59.35 1:02.44',
    '18 52.78 59.45 1:01.22 24 22.10']

results = format_str(test_str)
print(results)

result is: 结果是:

[['25', None, None, None], ['24', None, None, '22.10'], ['16', None, '59.35', '1:02.44'], ['18', '52.78', '59.45', '1:01.22'], ['24', None, None, '22.10']]

I didn't do any formatting on the times so 52.78 isn't shown as 0:52.78 but I bet you can do that. 我当时没有进行任何格式化,因此52.78并未显示为0:52.78,但我敢打赌,您可以这样做。 If not, leave a comment and I'll edit a solution for that too 如果没有,请发表评论,我也将为此编辑解决方案

>>> age_expr = r"(\d+)"
>>> time_expr = r"((?:\s+)(?:\d+:)?\d+\.\d+)?"
>>> expr = re.compile(age_expr + time_expr * 3)
>>> [expr.findall(s) for s in test_str]
[[('25', '', '', '')], [('24', ' 22.10', '', '')], [('16', ' 59.35', ' 1:02.44', '')], [('18', ' 52.78', ' 59.45', ' 1:01.22')], [('33', ' 59.35', ' 1:02.44', ''), ('34', ' 52.78', ' 59.45', ' 1:01.22'), ('24', '', '', ''), ('25', '', '', '')]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM