![](/img/trans.png)
[英]How to extract text part from file using Python & Regular Expressions
[英]Using regular expressions to extract string from text file
从本质上讲,我有一个txt文档,
The sound of a horse at a gallop came fast and furiously up the hill.
"So-ho!" the guard sang out, as loud as he could roar.
"Yo there! Stand! I shall fire!"
The pace was suddenly checked, and, with much splashing and floundering, a man's voice called from the mist, "Is that the Dover mail?"
"Never you mind what it is!" the guard retorted. "What are you?"
"_Is_ that the Dover mail?"
"Why do you want to know?"
"I want a passenger, if it is."
"What passenger?"
"Mr. Jarvis Lorry."
Our booked passenger showed in a moment that it was his name.
The guard, the coachman, and the two other passengers eyed him distrustfully.
使用正则表达式,我需要在双引号中打印所有内容,我不想要完整的代码,我只需要知道我应该如何去做,哪个正则表达式将是最有用的。 提示和指示,请!
r'(".*?")'
将匹配双引号中的每个字符串。 括号表示捕获的组.
匹配每个字符(换行符除外), *
表示重复,而?
使它变得非贪婪(在下一个双引号之前停止匹配)。 如果需要,请在re.DOTALL
选项中添加make .
还匹配换行符。
这应该做到(下面有解释):
from __future__ import print_function
import re
txt = """The sound of a horse at a gallop came fast and furiously up the hill.
"So-ho!" the guard sang out, as loud as he could roar.
"Yo there! Stand! I shall fire!"
The pace was suddenly checked, and, with much splashing and floundering,
a man's voice called from the mist, "Is that the Dover mail?"
"Never you mind what it is!" the guard retorted. "What are you?"
"_Is_ that the Dover mail?"
"Why do you want to know?"
"I want a passenger, if it is."
"What passenger?"
"Mr. Jarvis Lorry."
Our booked passenger showed in a moment that it was his name.
The guard, the coachman, and the two other passengers eyed him distrustfully.
"""
strings = re.findall(r'"(.*?)"', txt)
for s in strings:
print(s)
结果:
So-ho!
Yo there! Stand! I shall fire!
Is that the Dover mail?
Never you mind what it is!
What are you?
_Is_ that the Dover mail?
Why do you want to know?
I want a passenger, if it is.
What passenger?
Mr. Jarvis Lorry.
r'"(.*?)"'
将匹配双引号中的每个字符串。 括号表示捕获组,因此您只会得到不带双引号的文本。 的.
匹配每个字符(换行符除外), *
表示“最后一件事零个或多个”,最后一个是.
。 ?
*
使得*
“非贪婪”,这意味着它匹配的越少越好。 如果您不使用?
,您只会得到一个结果; 一个字符串,其中包含第一个和最后一个双引号之间的所有内容。
您可以包括re.DOTALL标志,以便.
如果要提取与行交叉的字符串,还将匹配换行符。 如果要这样做,请使用re.findall(r'"(.*?)"', txt, re.DOTALL)
。 换行符将包含在字符串中,因此您必须进行检查。
解释不可避免地类似于/基于@ TigerhawkT3的答案。 也投票给那个答案!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.