[英]re.findall multiline python
re.findall with re.M 没有找到我要搜索的多行
我正在尝试从文件中提取与模式匹配的所有多行字符串
来自文件book.txt
:
Title: Le Morte D'Arthur, Volume I (of II)
King Arthur and of his Noble Knights of the Round Table
Author: Thomas Malory
Editor: William Caxton
Release Date: March, 1998 [Etext #1251]
Posting Date: November 6, 2009
Language: English
Title: Pride and Prejudice
Author: Jane Austen
Posting Date: August 26, 2008 [EBook #1342]
Release Date: June, 1998
Last Updated: October 17, 2016
Language: English
以下代码只返回第一行Le Morte D'Arthur, Volume I (of II)
re.findall('^Title:\s(.+)$', book, re.M)
我期待输出是
[' Le Morte D'Arthur, Volume I (of II)\\n King Arthur and of his Noble Knights of the Round Table', ' Pride and Prejudice']
澄清,
- 第二行是可选的,它在某些文件中存在,而在其他文件中不存在。 在第二行之后还有更多我不想阅读的文字。
- 使用re.findall(r'Title: (.+\\n.+)$', text, flags=re.MULTILINE)
工作但如果第二行只是空白则失败。
- 我正在运行 python3.7。
- 我正在将 txt 文件转换为字符串,然后在 str 上运行re
。
- 以下也不起作用:
re.findall(r'^Title:\\s(.+)$', text, re.S)
re.findall(r'^Title:\\s(.+)$', text, re.DOTALL)
我猜可能是这个表情,
(?<=Title:\s)(.*?)\s*(?=Author)
可能接近可能需要的设计。
import re
regex = r"(?<=Title:\s)(.*?)\s*(?=Author)"
test_str = ("Title: Le Morte D'Arthur, Volume I (of II)\n"
" King Arthur and of his Noble Knights of the Round Table\n\n"
"Title: Le Morte D'Arthur, Volume I (of II)\n"
" King Arthur and of his Noble Knights of the Round Table")
print(re.findall(regex, test_str, re.DOTALL))
["Le Morte D'Arthur, Volume I (of II)\n King Arthur and of his Noble Knights of the Round Table\n\n", "Le Morte D'Arthur, Volume I (of II)\n King Arthur and of his Noble Knights of the Round Table"]
您可以使用带有DOTALL
标志的正则表达式来允许您的.
匹配换行符:
re.findall('^Title:\s(.+)$', book, re.DOTALL)
输出 :
Le Morte D'Arthur, Volume I (of II)\n King Arthur and of his Noble Knights of the Round Table
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.