![](/img/trans.png)
[英]How to extract text part from file using Python & Regular Expressions
[英]Using regular expressions to extract string from text file
從本質上講,我有一個txt文檔,
The sound of a horse at a gallop came fast and furiously up the hill.
"So-ho!" the guard sang out, as loud as he could roar.
"Yo there! Stand! I shall fire!"
The pace was suddenly checked, and, with much splashing and floundering, a man's voice called from the mist, "Is that the Dover mail?"
"Never you mind what it is!" the guard retorted. "What are you?"
"_Is_ that the Dover mail?"
"Why do you want to know?"
"I want a passenger, if it is."
"What passenger?"
"Mr. Jarvis Lorry."
Our booked passenger showed in a moment that it was his name.
The guard, the coachman, and the two other passengers eyed him distrustfully.
使用正則表達式,我需要在雙引號中打印所有內容,我不想要完整的代碼,我只需要知道我應該如何去做,哪個正則表達式將是最有用的。 提示和指示,請!
r'(".*?")'
將匹配雙引號中的每個字符串。 括號表示捕獲的組.
匹配每個字符(換行符除外), *
表示重復,而?
使它變得非貪婪(在下一個雙引號之前停止匹配)。 如果需要,請在re.DOTALL
選項中添加make .
還匹配換行符。
這應該做到(下面有解釋):
from __future__ import print_function
import re
txt = """The sound of a horse at a gallop came fast and furiously up the hill.
"So-ho!" the guard sang out, as loud as he could roar.
"Yo there! Stand! I shall fire!"
The pace was suddenly checked, and, with much splashing and floundering,
a man's voice called from the mist, "Is that the Dover mail?"
"Never you mind what it is!" the guard retorted. "What are you?"
"_Is_ that the Dover mail?"
"Why do you want to know?"
"I want a passenger, if it is."
"What passenger?"
"Mr. Jarvis Lorry."
Our booked passenger showed in a moment that it was his name.
The guard, the coachman, and the two other passengers eyed him distrustfully.
"""
strings = re.findall(r'"(.*?)"', txt)
for s in strings:
print(s)
結果:
So-ho!
Yo there! Stand! I shall fire!
Is that the Dover mail?
Never you mind what it is!
What are you?
_Is_ that the Dover mail?
Why do you want to know?
I want a passenger, if it is.
What passenger?
Mr. Jarvis Lorry.
r'"(.*?)"'
將匹配雙引號中的每個字符串。 括號表示捕獲組,因此您只會得到不帶雙引號的文本。 的.
匹配每個字符(換行符除外), *
表示“最后一件事零個或多個”,最后一個是.
。 ?
*
使得*
“非貪婪”,這意味着它匹配的越少越好。 如果您不使用?
,您只會得到一個結果; 一個字符串,其中包含第一個和最后一個雙引號之間的所有內容。
您可以包括re.DOTALL標志,以便.
如果要提取與行交叉的字符串,還將匹配換行符。 如果要這樣做,請使用re.findall(r'"(.*?)"', txt, re.DOTALL)
。 換行符將包含在字符串中,因此您必須進行檢查。
解釋不可避免地類似於/基於@ TigerhawkT3的答案。 也投票給那個答案!
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.