[英]How to use BeautifulSoup to get content between<hr class = 'calibre2'> … <hr class=“calibre2” />
<hr class="calibre2" />
<h3 class="calibre5">-ability</h3> (in nouns 構成名詞) : <br class="calibre4" />
<blockquote class="calibre6"><p class="calibre_1"><span class="italic">◊ capability 能力 </span></p></blockquote>
<blockquote class="calibre6"><p class="calibre_1"><span class="italic">◊ responsibility 責任 </span></p></blockquote>
<hr class="calibre2" />
<h3 class="calibre5">-ibility</h3> (in nouns 構成名詞) : <br class="calibre4" />
<blockquote class="calibre6"><p class="calibre_1"><span class="italic">◊ capability 能力 </span></p></blockquote>
<blockquote class="calibre6"><p class="calibre_1"><span class="italic">◊ responsibility 責任 </span></p></blockquote>
<hr class="calibre2" />
上面這是我喝湯的一部分,我想在兩個<hr>
之間獲取內容,因為hr不是close標簽,所以我不能使用一些簡單的方法,我想如果可以使用find_next_elements,但是當他看到<hr class = 'calibre2'>
,如何讓他停下來,所以我可以得到那些內容,謝謝。
您可以遍歷所有hr
元素,並使用.find_next_siblings()
迭代下一個同級元素。 然后,如果遇到hr
,請中斷循環:
for hr in soup.find_all("hr", class_="calibre2"):
for item in hr.find_next_siblings():
if item.name == "hr":
break
print(item)
print("-----")
您可以與find_all_next https://www.crummy.com/software/BeautifulSoup/bs4/doc/#find-all-next-and-find-next一起檢查hr和calibre2類
from bs4 import BeautifulSoup
testStr = """
<hr class="calibre2" />
<h3 class="calibre5">-ability</h3> (in nouns 構成名詞) : <br class="calibre4" />
<blockquote class="calibre6"><p class="calibre_1"><span class="italic">◊ capability 能力 </span></p></blockquote>
<blockquote class="calibre6"><p class="calibre_1"><span class="italic">◊ responsibility 責任 </span></p></blockquote>
<hr class="calibre2" />
<h3 class="calibre5">-ibility</h3> (in nouns 構成名詞) : <br class="calibre4" />
<blockquote class="calibre6"><p class="calibre_1"><span class="italic">◊ capability 能力 </span></p></blockquote>
<blockquote class="calibre6"><p class="calibre_1"><span class="italic">◊ responsibility 責任 </span></p></blockquote>
<hr class="calibre2" />
""";
soup = BeautifulSoup(testStr, 'lxml')
hrTag = soup.hr
nextTags = hrTag.find_all_next()
content = []
for item in nextTags:
# check if we have reached the second calibre2 hr
print("Name %s ; Class %s" % (item.name, item['class'][0]))
if item.name == 'hr' and item['class'][0] == 'calibre2':
break
content.append(item)
print(content)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.