简体   繁体   English

根据正则表达式模式拆分字符串

[英]Split string based on regex pattern

I have a message which I am trying to split.我有一条消息,我正在尝试拆分。

import re

message = "Aug 10, 17:04 UTCThis is update 1.Aug 10, 15:56 UTCThis is update 2.Aug 10, 15:55 UTCThis is update 3."

split_message = re.split(r'[a-zA-Z]{3} (0[1-9]|[1-2][0-9]|3[0-1]), ([0-1]?[0-9]|2[0-3]):[0-5][0-9] UTC', message)

print(split_message)

Expected Output:预计 Output:

["This is update 1", "This is update 2", "This is update 3"]

Actual Output:实际 Output:

['', '10', '17', "This is update 1", '10', '15',  "This is update 2", '10', '15', "This is update 3"]

Not sure what I am missing.不知道我错过了什么。

You are using "capturing groups", this is why their content is also part of the result array.您正在使用“捕获组”,这就是为什么它们的内容也是结果数组的一部分的原因。 You'll want to use non capturing groups (beginning with ?: ):您需要使用非捕获组(以?:开头):

import re

message = "Aug 10, 17:04 UTCThis is update 1.Aug 10, 15:56 UTCThis is update 2.Aug 10, 15:55 UTCThis is update 3."

split_message = re.split(r"[a-zA-Z]{3} (?:0[1-9]|[1-2][0-9]|3[0-1]), (?:[0-1]?[0-9]|2[0-3]):[0-5][0-9] UTC", message)

print(split_message)

You will however always get an empty entry first, because an empty string is in front of your first split pattern:但是,您总是会先得到一个空条目,因为您的第一个拆分模式前面有一个空字符串:

['', 'This is update 1.', 'This is update 2.', 'This is update 3.']

As statet in the docs :正如文档中的声明:

If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.如果在模式中使用捕获括号,则模式中所有组的文本也作为结果列表的一部分返回。

Not using regex, but wanted to highlight the power of Python string splitting for tasks like this.不使用正则表达式,但想强调 Python 字符串拆分对此类任务的强大功能。 Way less headaches as easier to understand.更容易理解的方式减少了头痛。

message = "Aug 10, 17:04 UTCThis is update 1.Aug 10, 15:56 UTCThis is update 2.Aug 10, 15:55 UTCThis is update 3."
values = message.split("UTC")
values = values[1:]
result = [v.split(".")[0] for v in values]

Note: this may not work if your messages ("This is update 1.") contain multiple.注意:如果您的消息(“这是更新 1。”)包含多个,这可能不起作用。 symbols.符号。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM