![](/img/trans.png)
[英]Extract nth tags from HTML after specific tag with beautifulsoup
[英]BeautifulSoup + Python (Extract Specific HTML Tags from Page Source Code)
我有以下 HTML 代碼:
<h3>Some Heading Text Here 1</h3>
<p>Some paragraph text here</p>
<p>Some paragraph text here</p>
<p>Some paragraph text here</p>
<ul>
<li>Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
<li>Item 4</li>
</ul>
<h3>Some Heading Text Here 2</h3>
<p>Some paragraph text here</p>
<p>Some paragraph text here</p>
<h4>Q1</h4>
<p>A1</p>
<h4>Q2</h4>
<p>A2</p>
<h4>Q3</h4>
<p>A3</p>
我想從第一個<h3>
及其子級中提取 HTML 直到第一次出現<h4>
標記。
預期 Output:
<h3>Some Heading Text Here 1</h3>
<p>Some paragraph text here</p>
<p>Some paragraph text here</p>
<p>Some paragraph text here</p>
<ul>
<li>Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
<li>Item 4</li>
</ul>
<h3>Some Heading Text Here 2</h3>
<p>Some paragraph text here</p>
<p>Some paragraph text here</p>
我嘗試了以下方法,結果如下:
from bs4 import BeautifulSoup
data = """<h3>Some Heading Text Here 1</h3>
<p>Some paragraph text here</p>
<p>Some paragraph text here</p>
<p>Some paragraph text here</p>
<ul>
<li>Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
<li>Item 4</li>
</ul>
<h3>Some Heading Text Here 2</h3>
<p>Some paragraph text here</p>
<p>Some paragraph text here</p>
<h4>Q1</h4>
<p>A1</p>
<h4>Q2</h4>
<p>A2</p>
<h4>Q3</h4>
<p>A3</p>"""
soup = BeautifulSoup(data)
tags = soup.find_all('h3')
text = ""
for i in tags:
# print(i)
text = text+str(i)
for x in i.next_siblings:
if x.name == 'h4':
break
else:
text = text+str(x)
print(text)
Output:
<h3>Some Heading Text Here 1</h3>
<p>Some paragraph text here</p>
<p>Some paragraph text here</p>
<p>Some paragraph text here</p>
<ul>
<li>Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
<li>Item 4</li>
</ul>
<h3>Some Heading Text Here 2</h3>
<p>Some paragraph text here</p>
<p>Some paragraph text here</p>
<h3>Some Heading Text Here 2</h3>
<p>Some paragraph text here</p>
<p>Some paragraph text here</p>
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.