繁体   English   中英

re.findall 多行 python

[英]re.findall multiline python

re.findall with re.M 没有找到我要搜索的多行

我正在尝试从文件中提取与模式匹配的所有多行字符串

来自文件book.txt

Title: Le Morte D'Arthur, Volume I (of II)
       King Arthur and of his Noble Knights of the Round Table

Author: Thomas Malory

Editor: William Caxton

Release Date: March, 1998  [Etext #1251]
Posting Date: November 6, 2009

Language: English

Title: Pride and Prejudice

Author: Jane Austen

Posting Date: August 26, 2008 [EBook #1342]
Release Date: June, 1998
Last Updated: October 17, 2016

Language: English

以下代码只返回第一行Le Morte D'Arthur, Volume I (of II)

re.findall('^Title:\s(.+)$', book, re.M)

我期待输出是

[' Le Morte D'Arthur, Volume I (of II)\\n King Arthur and of his Noble Knights of the Round Table', ' Pride and Prejudice']

澄清,
- 第二行是可选的,它在某些文件中存在,而在其他文件中不存在。 在第二行之后还有更多我不想阅读的文字。
- 使用re.findall(r'Title: (.+\\n.+)$', text, flags=re.MULTILINE)工作但如果第二行只是空白则失败。
- 我正在运行 python3.7。
- 我正在将 txt 文件转换为字符串,然后在 str 上运行re
- 以下也不起作用:
re.findall(r'^Title:\\s(.+)$', text, re.S)
re.findall(r'^Title:\\s(.+)$', text, re.DOTALL)

我猜可能是这个表情,

(?<=Title:\s)(.*?)\s*(?=Author)

可能接近可能需要的设计。

演示

测试

import re

regex = r"(?<=Title:\s)(.*?)\s*(?=Author)"

test_str = ("Title: Le Morte D'Arthur, Volume I (of II)\n"
    "       King Arthur and of his Noble Knights of the Round Table\n\n"
    "Title: Le Morte D'Arthur, Volume I (of II)\n"
    "       King Arthur and of his Noble Knights of the Round Table")

print(re.findall(regex, test_str, re.DOTALL))

输出

["Le Morte D'Arthur, Volume I (of II)\n       King Arthur and of his Noble Knights of the Round Table\n\n", "Le Morte D'Arthur, Volume I (of II)\n       King Arthur and of his Noble Knights of the Round Table"]

您可以使用带有DOTALL标志的正则表达式来允许您的. 匹配换行符:

re.findall('^Title:\s(.+)$', book, re.DOTALL)

输出 :

Le Morte D'Arthur, Volume I (of II)\n       King Arthur and of his Noble Knights of the Round Table

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM