如何從中提取文本<br> Python 中的標簽？

Question

試圖從 br 標簽中提取文本。 我們如何做到這一點？

到目前為止，我嘗試了不同的組合，但沒有成功。 完全初學者。

from requests import get
from bs4 import BeautifulSoup


article = "https://www.readlightnovel.org/martial-god-asura/chapter-4095"
r = get(article, headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)"})
soup = BeautifulSoup(r.content, "lxml")
title = soup.find(class_="block-title")
print(title.text.strip())
full_article = soup.find("div", {"class": "desc"})
for br_tags in full_article:
    desc = br_tags.findAll("br")
    print(desc.text.strip())

Answer 1

此代碼打印小說的文本。 它進行基本的預處理（刪除廣告等）：

from requests import get
from bs4 import BeautifulSoup

article = "https://www.readlightnovel.org/martial-god-asura/chapter-4095"
r = get(article, headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)"})
soup = BeautifulSoup(r.content, "lxml")
title = soup.find(class_="block-title")
print(title.text.strip())

full_article = soup.find("div", {"class": "desc"})

# remove ads inside the text:
for ads in full_article.select('center, small, a, .hidden'):
    ads.extract()

print( full_article.get_text(strip=True, separator='\n') )

印刷：

Martial God Asura - Chapter 4095
Chapter 4095
Hundreds of millions of lightning bolts struck down from the sky and continued to bombard Chu Feng’s location .
Being bombarded in such a manner, everyone felt that Chu Feng had long been killed .
Because of that, they were feeling confused .
Since Chu Feng was already dead, why was the lightning still attacking his location nonstop?
Furthermore, the attacks were growing fiercer and fiercer .
Confused, the crowd’s eyes turned to Chu Feng’s location, their gazes fixed onto that place .
Because of that, no one noticed that the humanoid body of light in the clouds had actually lowered its head . It was as if it were looking at Chu Feng’s location .
In fact, that was exactly what was happening .

... and so on.

Answer 2

如果您只想要介於兩者之間的任何文本
標簽，您可以執行以下操作：

from BeautifulSoup import BeautifulSoup, NavigableString, Tag

input = '''<br />
Important Text 1
<br />
<br />
Not Important Text
<br />
Important Text 2
<br />
Important Text 3
<br />
<br />
Non Important Text
<br />
Important Text 4
<br />'''

soup = BeautifulSoup(input)

for br in soup.findAll('br'):
    next_s = br.nextSibling
    if not (next_s and isinstance(next_s,NavigableString)):
        continue
    next2_s = next_s.nextSibling
    if next2_s and isinstance(next2_s,Tag) and next2_s.name == 'br':
        text = str(next_s).strip()
        if text:
            print "Found:", next_s

Answer 3

你只需要看看下一個兄弟姐妹：

 for br_tags in full_article:
    desc = br_tags.findAll("br")
    print(''.join(desc.next_siblings))

這可能有助於獲得下一個 br 標簽！ 我剛剛從您的代碼中獲取了片段！

如何從中提取文本<br> Python 中的標簽？

問題描述

3 個解決方案

解決方案1
2 已采納 2020-05-14 17:34:41

解決方案2
0 2020-05-14 17:26:07

解決方案3
0 2020-05-14 17:30:41

如何從中提取文本<br> Python 中的標簽？

問題描述

3 個解決方案

解決方案1 2 已采納 2020-05-14 17:34:41

解決方案2 0 2020-05-14 17:26:07

解決方案3 0 2020-05-14 17:30:41

解決方案1
2 已采納 2020-05-14 17:34:41

解決方案2
0 2020-05-14 17:26:07

解決方案3
0 2020-05-14 17:30:41