簡體   English   中英

使用BeautifulSoup獲取跨度之間的文本

[英]get text between span with BeautifulSoup

我正在嘗試使用Python中的BeautifulSoup抓取各種站點。 說我有以下html摘錄:

<div class="member_biography">
<h3>Biography</h3>
<span class="sub_heading">District:</span> AnyState - At Large<br/>
<span class="sub_heading">Political Highlights:</span> AnyTown City Council, 19XX-XX<br/>
<span class="sub_heading">Born:</span> June X, 19XX; AnyTown, Calif.<br/>
<span class="sub_heading">Residence:</span> Some Town<br/>
<span class="sub_heading">Religion:</span> Episcopalian<br/>
<span class="sub_heading">Family:</span> Wife, Some Name; two children<br/>
<span class="sub_heading">Education:</span> Some State College, A.A. 19XX; Some Other State College, B.A. 19XX<br/>
<span class="sub_heading">Elected:</span> 19XX<br/>
</div>

我需要結果采用以下格式:

District:              AnyState - At Large
Political Highlights:  AnyTown City Council, 19XX-XX
Born:                  June X, 19XX; AnyTown, Calif.
Residence:             Some Town
Religion:              Episcopalian
Family:                Wife, Some Name; two children
Education:             Some State College, A.A. 19XX; Some Other State College, B.A. 19XX
Elected:               19XX

但是,到目前為止,我只能實現以下目標:

District:
Political Highlights:
Born:
Residence:
Religion:
Family:
Education:
Elected:

使用以下代碼:

import urllib.request
import sys
from bs4 import BeautifulSoup

def main(url):
    fp = urllib.request.urlopen(url)
    site_bytearray = fp.read()
    fp.close()

    #bs_data = BeautifulSoup(site_str,features="html.parser")
    bs_data = BeautifulSoup(site_bytearray,'lxml')
    tmplist = bs_data.find_all('span',{'class':'sub_heading'})
    for item in tmplist:
        print(item.text)
    sys.exit(0)

if __name__ == "__main__":
    main(sys.argv[1])

簡而言之,我該如何從<span class="sub_heading">District:</span> AnyState - At Large<br/>提取DistrictAnyState - At Large並將結果存儲在列表中以進行進一步處理?

將您的打印命令替換為:

Python 3.6及更高版本:

print(f'{item.text:<25} {item.next_sibling}') 

Python 3-3.5:

print('{:<25} {}'.format(item.text, item.next_sibling))

輸出:

District:                  AnyState - At Large
Political Highlights:      AnyTown City Council, 19XX-XX
Born:                      June X, 19XX; AnyTown, Calif.
Residence:                 Some Town
Religion:                  Episcopalian
Family:                    Wife, Some Name; two children
Education:                 Some State College, A.A. 19XX; Some Other State College, B.A. 19XX
Elected:                   19XX

您是否嘗試過使用getText()似乎總是對我getText()

從中獲取文本<div id="text_translate"><p>所以我試圖從網站上獲取特定的文本,但它只會給我錯誤 (floor = soup.find('span', {'class': 'text-white fs-14px text-truncate attribute-value'}) .text AttributeError: 'NoneType' object 沒有屬性 'text')</p><p> 我特別想獲得“底價”文本。</p><p> 我的代碼:</p><pre> import bs4 from bs4 import BeautifulSoup #target url url = "https://magiceden.io/marketplace/solsamo" #act like browser headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'} response = requests.get('https://magiceden.io/marketplace/solsamo') #parse the downloaded page soup = BeautifulSoup(response.content, 'lxml') floor = soup.find('span', {'class': 'text-white fs-14px text-truncate attribute-value'}).text print(floor)</pre></div>

[英]Get text from <span class: with Beautifulsoup and requests

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 beautifulsoup 通過 span 標簽之間的空格獲取文本 beautifulsoup 查找跨度之間的文本 Beautifulsoup-獲取不在特定標簽之間的文本(在之前和之后) <br> )? 我正在嘗試使用 BeautifulSoup 獲取產品信息,但無法獲取之間的文本<span> </span> 如何替換<span>BeautifulSoup 中標簽之間的文本?</span> 如何從BeautifulSoup中的span標簽獲取文本 如何使用beautifulsoup和python在span標簽中獲取文本 如何使用BeautifulSoup和Python在跨度后獲取文本? 從中獲取文本<div id="text_translate"><p>所以我試圖從網站上獲取特定的文本,但它只會給我錯誤 (floor = soup.find('span', {'class': 'text-white fs-14px text-truncate attribute-value'}) .text AttributeError: 'NoneType' object 沒有屬性 'text')</p><p> 我特別想獲得“底價”文本。</p><p> 我的代碼:</p><pre> import bs4 from bs4 import BeautifulSoup #target url url = "https://magiceden.io/marketplace/solsamo" #act like browser headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'} response = requests.get('https://magiceden.io/marketplace/solsamo') #parse the downloaded page soup = BeautifulSoup(response.content, 'lxml') floor = soup.find('span', {'class': 'text-white fs-14px text-truncate attribute-value'}).text print(floor)</pre></div> 使用 beautifulsoup 從嵌套的 span 標簽中獲取文本
 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM