简体   繁体   English

Python 正则表达式,多行匹配模式.. 为什么这不起作用?

[英]Python regex, matching pattern over multiple lines.. why isn't this working?

I know that for parsing I should ideally remove all spaces and linebreaks but I was just doing this as a quick fix for something I was trying and I can't figure out why its not working.. I have wrapped different areas of text in my document with the wrappers like "####1" and am trying to parse based on this but its just not working no matter what I try, I think I am using multiline correctly.. any advice is appreciated我知道对于解析,我应该理想地删除所有空格和换行符,但我这样做只是为了快速解决我正在尝试的事情,我不知道为什么它不起作用..我在我的文本中包裹了不同的区域带有“####1”等包装器的文档,并试图基于此进行解析,但无论我尝试什么,它都无法正常工作,我想我正确使用了多行.. 任何建议表示赞赏

This returns no results at all:这根本不返回任何结果:

string='
####1
ttteest
####1
ttttteeeestt

####2   

ttest
####2'

import re
pattern = '.*?####(.*?)####'
returnmatch = re.compile(pattern, re.MULTILINE).findall(string)
return returnmatch

Multiline doesn't mean .多行并不意味着. will match line return, it means that ^ and $ are limited to lines only将匹配行返回,这意味着^$仅限于行

re.M re.MULTILINE re.M re.MULTILINE

When specified, the pattern character '^' matches at the beginning of the string and at the >beginning of each line (immediately following each newline);指定后,模式字符 '^' 匹配字符串的开头和每行的 > 开头(紧跟在每个换行符之后); and the pattern character '$' >matches at the end of the string and at the end of each line (immediately preceding each >newline).并且模式字符 '$' > 匹配字符串的末尾和每行的末尾(紧接在每个 > 换行符之前)。 By default, '^' matches only at the beginning of the string, and '$' only at the >end of the string and immediately before the newline (if any) at the end of the string.默认情况下,'^' 仅匹配字符串的开头,而 '$' 仅匹配字符串的 > 结尾和字符串结尾的换行符(如果有)之前。

re.S or re.DOTALL makes . re.Sre.DOTALL使. match even new lines.甚至匹配新行。

Source来源

http://docs.python.org/ http://docs.python.org/

Try re.findall(r"####(.*?)\\s(.*?)\\s####", string, re.DOTALL) (works with re.compile too, of course).尝试re.findall(r"####(.*?)\\s(.*?)\\s####", string, re.DOTALL) (当然也适用于re.compile )。

This regexp will return tuples containing the number of the section and the section content.此正则表达式将返回包含节编号和节内容的元组。

For your example, this will return [('1', 'ttteest'), ('2', ' \\n\\nttest')] .对于您的示例,这将返回[('1', 'ttteest'), ('2', ' \\n\\nttest')]

(BTW: your example won't run, for multiline strings, use ''' or """ ) (顺便说一句:您的示例不会运行,对于多行字符串,请使用'''"""

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM