Python正則表達式在兩個字符串之間獲取文本

Question

當我閱讀文本時，在文本的某些行中有類似<h3 class="heading">General Purpose</h3>字符串，現在我只想從上面獲取General Purpose值。

d = re.search(re.escape('<h3 class="heading">')+"(.*?)"+re.escape('</h3>'), str(data2))
if d:
    print(d.group(0))

Answer 1

import re

text="""<h3 class="heading">General Purpose</h3>"""
pattern="(<.*?>)(.*)(<.*?>)"

g=re.search(pattern,text)
g.group(2)

輸出：

'General Purpose'

如果它是一個漂亮的湯對象，那么它甚至更容易獲得價值。 您將不需要正則表達式。

from bs4 import BeautifulSoup

text="""<h3 class="heading">General Purpose</h3>"""
a=BeautifulSoup(text)
print a.select('h3.heading')[0].text

輸出：

General Purpose

Answer 2

組0包含整個比賽； 您需要第1組的內容：

print(d.group(1))

但是通常，使用正則表達式來解析HTML並不是一個好主意（盡管實際上，嵌套的h3標簽應該很少見）。

Answer 3

警告：僅在python中起作用，在pcre或JS中不起作用（JS不支持Lookookhind）。

(?<=\<\h3 class=\"heading\"\>).*?(?=\<\/h3\>)