我想捕获以单分号开头的行之间的文本:

样本输入:

s = '''
;

the color blue

;

the color green

;

the color red

;
'''

这是所需的输出:

['the color blue', 'the color green', 'the color red']

这个尝试的解决方案不起作用:

import re
pat = r'^;(.*)^;'
r = re.findall(pat, s, re.S|re.M)
print(r)

这是错误的输出:

['\n\nthe color blue\n\n;\n\nthe color green\n\n;\n\nthe color red\n\n']

===============>>#1 票数:1

非正则表达式解决方案,我继续; 并删除空字符串

s = '''
    ;

    the color blue


;

the color green

;

the color red

;
'''

f = s.split(';')


x = [a.strip('\n') for a in f]

print(x) #prints ['', 'the color blue', 'the color green', 'the color red', '']

a = [elem for elem in x if len(elem)]

print(a) #prints ['the color blue', 'the color green', 'the color red']

===============>>#2 票数:1 已采纳

像定界符一样对待它。

(?sm)^;\\s*\\r?\\n(.*?)\\s*(?=^;\\s*\\r?\\n)

https://regex101.com/r/4tKX0F/1

解释

 (?sm)                         # Modifiers: dot-all, multi-line
 ^ ; \s* \r? \n                # Begining delimiter
 ( .*? )                       # (1), Text 
 \s*                           # Wsp trim
 (?= ^ ; \s* \r? \n )          # End delimiter

===============>>#3 票数:0

您可以将其作为模式:

pat = r';\n\n([\w* *]*)'

r = re.findall(pat, s)

那应该捕获您的需要。

===============>>#4 票数:0

您可以使用;\\s*(.*?)\\s*(?=;) 用法:

print( re.findall(r'(?s);\s*(.*?)\s*(?=;)', s) )
# output: ['the color blue', 'the color green', 'the color red']

说明:

(?s)   # dot-all modifier (. matches newlines)
;      # consume a semicolon
\s*    # skip whitespace
(.*?)  # capture the following text, as little as possible, such that...
\s*    # ... it is followed only by (optional) whitespace, and...
(?=;)  # ... a semicolon

===============>>#5 票数:0

我知道你没有要求。 但是值得考虑使用pyparsing作为re的替代方法。 确实,pyparsing包含正则表达式。 请注意,此简单的解析器如何处理各种数量的空行。

>>> parsifal = open('temp.txt').read()
>>> print (parsifal)


;

the colour blue
;
the colour green
;
the colour red
;
the colour purple




;

the colour magenta

;


>>> import pyparsing as pp
>>> p = pp.OneOrMore(pp.Suppress(';\n')+pp.ZeroOrMore(pp.Suppress('\n'))+pp.CharsNotIn(';\n')+pp.ZeroOrMore(pp.Suppress('\n')))
>>> p.parseString(parsifal)
(['the colour blue', 'the colour green', 'the colour red', 'the colour purple', 'the colour magenta'], {})

总体而言,解析器会匹配分号或换行符的OneOrMore序列,然后匹配这些字符以外的任何字符,再匹配换行符。

  ask by Nicholas Nickleby translate from so

未解决问题?本站智能推荐: