[英]How can one use regex to capture the text occurring between lines beginning with a single semicolon?
我想捕获以单分号开头的行之间的文本:
样本输入:
s = '''
;
the color blue
;
the color green
;
the color red
;
'''
这是所需的输出:
['the color blue', 'the color green', 'the color red']
这个尝试的解决方案不起作用:
import re
pat = r'^;(.*)^;'
r = re.findall(pat, s, re.S|re.M)
print(r)
这是错误的输出:
['\n\nthe color blue\n\n;\n\nthe color green\n\n;\n\nthe color red\n\n']
非正则表达式解决方案,我继续;
并删除空字符串
s = '''
;
the color blue
;
the color green
;
the color red
;
'''
f = s.split(';')
x = [a.strip('\n') for a in f]
print(x) #prints ['', 'the color blue', 'the color green', 'the color red', '']
a = [elem for elem in x if len(elem)]
print(a) #prints ['the color blue', 'the color green', 'the color red']
像定界符一样对待它。
(?sm)^;\\s*\\r?\\n(.*?)\\s*(?=^;\\s*\\r?\\n)
https://regex101.com/r/4tKX0F/1
解释
(?sm) # Modifiers: dot-all, multi-line
^ ; \s* \r? \n # Begining delimiter
( .*? ) # (1), Text
\s* # Wsp trim
(?= ^ ; \s* \r? \n ) # End delimiter
您可以将其作为模式:
pat = r';\n\n([\w* *]*)'
r = re.findall(pat, s)
那应该捕获您的需要。
您可以使用;\\s*(.*?)\\s*(?=;)
。 用法:
print( re.findall(r'(?s);\s*(.*?)\s*(?=;)', s) )
# output: ['the color blue', 'the color green', 'the color red']
说明:
(?s) # dot-all modifier (. matches newlines)
; # consume a semicolon
\s* # skip whitespace
(.*?) # capture the following text, as little as possible, such that...
\s* # ... it is followed only by (optional) whitespace, and...
(?=;) # ... a semicolon
我知道你没有要求。 但是值得考虑使用pyparsing作为re的替代方法。 确实,pyparsing包含正则表达式。 请注意,此简单的解析器如何处理各种数量的空行。
>>> parsifal = open('temp.txt').read()
>>> print (parsifal)
;
the colour blue
;
the colour green
;
the colour red
;
the colour purple
;
the colour magenta
;
>>> import pyparsing as pp
>>> p = pp.OneOrMore(pp.Suppress(';\n')+pp.ZeroOrMore(pp.Suppress('\n'))+pp.CharsNotIn(';\n')+pp.ZeroOrMore(pp.Suppress('\n')))
>>> p.parseString(parsifal)
(['the colour blue', 'the colour green', 'the colour red', 'the colour purple', 'the colour magenta'], {})
总体而言,解析器会匹配分号或换行符的OneOrMore
序列,然后匹配这些字符以外的任何字符,再匹配换行符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.