繁体   English   中英

使用正则表达式从文本文件中提取字符串

[英]Using regular expressions to extract string from text file

从本质上讲,我有一个txt文档,

The sound of a horse at a gallop came fast and furiously up the hill.
"So-ho!" the guard sang out, as loud as he could roar.
"Yo there! Stand! I shall fire!"
The pace was suddenly checked, and, with much splashing and floundering, a man's voice called from the mist, "Is that the Dover mail?"
"Never you mind what it is!" the guard retorted. "What are you?"
"_Is_ that the Dover mail?"
"Why do you want to know?"
"I want a passenger, if it is."
"What passenger?"
"Mr. Jarvis Lorry."
Our booked passenger showed in a moment that it was his name.
The guard, the coachman, and the two other passengers eyed him distrustfully.

使用正则表达式,我需要在双引号中打印所有内容,我不想要完整的代码,我只需要知道我应该如何去做,哪个正则表达式将是最有用的。 提示和指示,请!

r'(".*?")'将匹配双引号中的每个字符串。 括号表示捕获的组. 匹配每个字符(换行符除外), *表示重复,而? 使它变得非贪婪(在下一个双引号之前停止匹配)。 如果需要,请在re.DOTALL选项中添加make . 还匹配换行符。

这应该做到(下面有解释):

from __future__ import print_function

import re

txt = """The sound of a horse at a gallop came fast and furiously up the hill.
"So-ho!" the guard sang out, as loud as he could roar.
"Yo there! Stand! I shall fire!"
The pace was suddenly checked, and, with much splashing and floundering,
a man's voice called from the mist, "Is that the Dover mail?"
"Never you mind what it is!" the guard retorted. "What are you?"
"_Is_ that the Dover mail?"
"Why do you want to know?"
"I want a passenger, if it is."
"What passenger?"
"Mr. Jarvis Lorry."
Our booked passenger showed in a moment that it was his name.
The guard, the coachman, and the two other passengers eyed him distrustfully.
"""

strings = re.findall(r'"(.*?)"', txt)

for s in strings:
    print(s)

结果:

So-ho!
Yo there! Stand! I shall fire!
Is that the Dover mail?
Never you mind what it is!
What are you?
_Is_ that the Dover mail?
Why do you want to know?
I want a passenger, if it is.
What passenger?
Mr. Jarvis Lorry.

r'"(.*?)"'将匹配双引号中的每个字符串。 括号表示捕获组,因此您只会得到不带双引号的文本。 . 匹配每个字符(换行符除外), *表示“最后一件事零个或多个”,最后一个是. ? *使得* “非贪婪”,这意味着它匹配的越少越好。 如果您不使用? ,您只会得到一个结果; 一个字符串,其中包含第一个和最后一个双引号之间的所有内容。

您可以包括re.DOTALL标志,以便. 如果要提取与行交叉的字符串,还将匹配换行符。 如果要这样做,请使用re.findall(r'"(.*?)"', txt, re.DOTALL) 换行符包含在字符串中,因此您必须进行检查。

解释不可避免地类似于/基于@ TigerhawkT3的答案。 也投票给那个答案!

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM