簡體   English   中英

解析時在元素標簽內獲取內容

[英]Get content inside element tags when parsing

鑒於我的代碼:

browser.get(s_page_url)
soup = BeautifulSoup(browser.page_source, "html.parser")
s_image_element = soup.find('a', {'id': 'angle-3'})
s_image_href = s_image_element['href']
s_image_url = "http://www.zappos.com" + s_image_href
s_title_element = soup.find('h1', {'class': 'banner'})
print s_title_element

當前產生以下輸出:

<h1 class="banner">
<a href="http://couture.zappos.com/a-testoni">a. testoni</a>
<meta content="a. testoni" itemtype="brand">
<a href="/p/a-testoni-sport-nappa-calf-sneaker/product/8835012"><span class="ProductName" itemprop="name">Sport Nappa Calf Sneaker</span></a>
</meta></h1>

我如何在<a href="http://couture.zappos.com/a-testoni">a. testoni</a>獲得文本<a href="http://couture.zappos.com/a-testoni">a. testoni</a> <a href="http://couture.zappos.com/a-testoni">a. testoni</a>a. testoni a. testoni<span class="ProductName" itemprop="name">Sport Nappa Calf Sneaker</span>Sport Nappa Calf Sneaker

到目前為止,我已經嘗試了以下方法:

print s_title_element['a']

但是得到以下錯誤信息:

  File "C:\Python27\lib\site-packages\bs4\element.py", line 958, in __getitem__
    return self.attrs[key]
KeyError: 'a'
print s_title_element.get_text(strip=True,separator=" " )

get_text將連接標記對象中的所有文本。 strip=True將刪除開頭和結尾的空白

def get_text(self, separator="", strip=False,
             types=(NavigableString, CData)):
    """
    Get all child strings, concatenated using the given separator.

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM