简体   繁体   English

正则表达式根据句型拆分文本

[英]Regex to split text based on a sentence pattern

I have a text that looks something like: ' 19:54:12 From X to Y: some text after 21:08:15 From A to B:another text '我有一个看起来像这样的文本:' 19:54:12 From X to Y: some text after 21:08:15 From A to B:another text '

I want to split the text based on the 19:54:12 From X to Y: sentence pattern.我想根据19:54:12 From X to Y:句型拆分文本。 Ideally the result would look something like this ['19:54:12 From X to Y:', ' some text after', '21:08:15 From A to B:', 'another text'] .理想情况下,结果应该是这样的['19:54:12 From X to Y:', ' some text after', '21:08:15 From A to B:', 'another text']

X and Y can be multiple words including symbols. X 和 Y 可以是多个单词,包括符号。 Note that between the time string and the word 'From' there's one space, but after that there are two spaces between elements.请注意,在时间字符串和单词“From”之间有一个空格,但之后元素之间有两个空格。

I'm using Python.我正在使用 Python。 I've managed to split the text based on the time string: re.split('(\d{2}:\d{2}:\d{2})+\s', string) however I'd like it to take into account the following word structure including the colon at the end, and also keep those words together with the time in the output list.我已经设法根据时间字符串拆分文本: re.split('(\d{2}:\d{2}:\d{2})+\s', string)但是我想要它要考虑以下单词结构,包括末尾的冒号,并将这些单词与 output 列表中的时间一起保存。

Help much appreciated!非常感谢帮助!

You can split using this regex, which matches the time string followed by From and all the characters up to the colon:您可以使用此正则表达式进行拆分,它匹配时间字符串,后跟From以及直到冒号的所有字符:

(\d{2}:\d{2}:\d{2} From  [^:]*:)

In python:在 python 中:

s = '19:54:12 From  X  to  Y: some text after 21:08:15 From  A  to  B:another text'
re.split(r'(\d{2}:\d{2}:\d{2} From  [^:]*:)', s)

Output: Output:

[
 '',
 '19:54:12 From  X  to  Y:',
 ' some text after ',
 '21:08:15 From  A  to  B:',
 'another text'
]

Note there is an empty value in the array due to the split pattern occurring at the beginning of the string;请注意,由于在字符串开头出现拆分模式,因此数组中有一个空值; you can remove that with a list comprehension eg您可以使用列表理解将其删除,例如

[s for s in re.split(r'(\d{2}:\d{2}:\d{2} From  [^:]*:)', s) if s]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM