簡體   English   中英

使用正則表達式從文本文件中提取字符串

[英]Using regular expressions to extract string from text file

從本質上講,我有一個txt文檔,

The sound of a horse at a gallop came fast and furiously up the hill.
"So-ho!" the guard sang out, as loud as he could roar.
"Yo there! Stand! I shall fire!"
The pace was suddenly checked, and, with much splashing and floundering, a man's voice called from the mist, "Is that the Dover mail?"
"Never you mind what it is!" the guard retorted. "What are you?"
"_Is_ that the Dover mail?"
"Why do you want to know?"
"I want a passenger, if it is."
"What passenger?"
"Mr. Jarvis Lorry."
Our booked passenger showed in a moment that it was his name.
The guard, the coachman, and the two other passengers eyed him distrustfully.

使用正則表達式,我需要在雙引號中打印所有內容,我不想要完整的代碼,我只需要知道我應該如何去做,哪個正則表達式將是最有用的。 提示和指示,請!

r'(".*?")'將匹配雙引號中的每個字符串。 括號表示捕獲的組. 匹配每個字符(換行符除外), *表示重復,而? 使它變得非貪婪(在下一個雙引號之前停止匹配)。 如果需要,請在re.DOTALL選項中添加make . 還匹配換行符。

這應該做到(下面有解釋):

from __future__ import print_function

import re

txt = """The sound of a horse at a gallop came fast and furiously up the hill.
"So-ho!" the guard sang out, as loud as he could roar.
"Yo there! Stand! I shall fire!"
The pace was suddenly checked, and, with much splashing and floundering,
a man's voice called from the mist, "Is that the Dover mail?"
"Never you mind what it is!" the guard retorted. "What are you?"
"_Is_ that the Dover mail?"
"Why do you want to know?"
"I want a passenger, if it is."
"What passenger?"
"Mr. Jarvis Lorry."
Our booked passenger showed in a moment that it was his name.
The guard, the coachman, and the two other passengers eyed him distrustfully.
"""

strings = re.findall(r'"(.*?)"', txt)

for s in strings:
    print(s)

結果:

So-ho!
Yo there! Stand! I shall fire!
Is that the Dover mail?
Never you mind what it is!
What are you?
_Is_ that the Dover mail?
Why do you want to know?
I want a passenger, if it is.
What passenger?
Mr. Jarvis Lorry.

r'"(.*?)"'將匹配雙引號中的每個字符串。 括號表示捕獲組,因此您只會得到不帶雙引號的文本。 . 匹配每個字符(換行符除外), *表示“最后一件事零個或多個”,最后一個是. ? *使得* “非貪婪”,這意味着它匹配的越少越好。 如果您不使用? ,您只會得到一個結果; 一個字符串,其中包含第一個和最后一個雙引號之間的所有內容。

您可以包括re.DOTALL標志,以便. 如果要提取與行交叉的字符串,還將匹配換行符。 如果要這樣做,請使用re.findall(r'"(.*?)"', txt, re.DOTALL) 換行符包含在字符串中,因此您必須進行檢查。

解釋不可避免地類似於/基於@ TigerhawkT3的答案。 也投票給那個答案!

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM