BeautifulSoup返回<a>标记的</a>一些奇怪的文本

Question

I'm new to web scraping and I'm trying to scrape data from this auction website. 我是网络抓取的新手，我正试图从该拍卖网站上抓取数据。 However, I meet this weird problem when trying to get the text of the anchor tag. 但是，在尝试获取anchor标签的文本时，我遇到了这个奇怪的问题。

Here's the HTML: 这是HTML：

<div class="mt50">
  <div class="head_011">
    <a id="item_event_title" href="https://www.storyltd.com/auction/auction.aspx?eid=4158">NO RESERVE AUCTION OF MODERN AND CONTEMPORARY ART  (16-17 APRIL 2019)</a>
  </div>
</div>

Here's my code: 这是我的代码：

auction_info = LTD_work_soup.find('a', id = 'item_event_title').text
print(auction_info)

This prints out "Back To Auction Catalogue" instead of 'NO RESERVE AUCTION OF MODERN AND CONTEMPORARY ART (16-17 APRIL 2019)' , which is what I am expecting. 这将打印出“返回拍卖目录”，而不是我期望的“现代和当代艺术无保留拍卖（2019年4月16日至17日）” 。

Here's the link to the page. 这是页面的链接。

Thank you. 谢谢。

Answer 1

Here how you can extract the NO RESERVE AUCTION OF MODERN AND CONTEMPORARY ART (16-17 APRIL 2019)' from the webpage: 在这里，您可以从网页中提取NO RESERVE AUCTION OF MODERN AND CONTEMPORARY ART (16-17 APRIL 2019)'的NO RESERVE AUCTION OF MODERN AND CONTEMPORARY ART (16-17 APRIL 2019)' ：

from bs4 import BeautifulSoup
import requests

page_link = 'https://www.storyltd.com/auction/item.aspx?eid=4158&amp&lotno=2'
page_response = requests.get(page_link, timeout=5)
page_content = BeautifulSoup(page_response.content, "html.parser")
page_content.find("input", attrs={"id":"hdnAuctionTitle"}).attrs['value']

Output: 输出：

NO RESERVE AUCTION OF MODERN AND CONTEMPORARY ART  (16-17 APRIL 2019)

When you check the page_content , you will find out that this sentence is present in an Input Tag. 当您检查page_content ，您会发现此句子出现在Input Tag中。

I hope it helps! 希望对您有所帮助！

BeautifulSoup返回<a>标记的</a>一些奇怪的文本

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-06-13 19:46:15

BeautifulSoup返回<a>标记的</a>一些奇怪的文本

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-06-13 19:46:15

解决方案1
1 已采纳 2019-06-13 19:46:15