使用Python和BeautifulSoup搜尋范圍不會返回任何內容

Question

我正在嘗試從此鏈接中提取特定文本：

http://www1.folha.uol.com.br/fsp/mercado/index-20121030.shtml

我編寫了此函數以查找和提取一段文本：

def manchete_11112011_30102012(b):
    soup = make_soup(b)
    data = [span.string for span in soup.find("font")]
    noticias = [b.text for b in soup.findAll("a")]
    return {"noticias": noticias,
            "data": data}

好。 我的問題是“數據”行。 當它運行時，什么也不返回。 當我寫“ span.string”時，它返回“ [none]” ，當我寫“ span.text”時，它返回“ [u”]”

這是我正在尋找的HTML代碼。 我需要<span id="spanLongDate">的文本內容：

<<td width="430" align="right"><font size="1"><span id="spanLongDate">São Paulo, terça-feira, 30 de outubro de 2012</span></font><img src="images/mercado.gif" hspace="10" alt="Mercado"></td>

還有其他方法可以提取文本嗎？ 我的意思是，我寫的代碼是錯誤的，還是文本格式不兼容？ “ [u]]”是什么意思？

Answer 1

要找到id = spanLongDate使用以下片段

//get the span you are looking for
span = soup.find("span", attrs = {"id":"spanLongDate"}) 

//get the text out of the span
data = span.get_text()

請注意，如果您必須使用.find_all查找多個實例，則此實例只會得到一個實例

預計到達時間：

根據您的以下評論，我去查看了頁面源，甚至在我的機器上運行了它。 這是一個功能，可讓您轉儲beautifulsoup看到的內容。 這很有用，因為有時在瀏覽器中查看源代碼時看不到您看到的內容。

def dumpPage():

    url = "http://www1.folha.uol.com.br/fsp/mercado/index-20121030.shtml"
    print("url is: " + url)
    page=urllib.request.urlopen(url)

    soup = BeautifulSoup(page.read())
    print("read soup")
    print(soup)

當我打印出來並搜索“ spanLongDate”時，得到了以下有趣的片段。

<td align="right" width="430"><font size="1"><span id="spanLongDate"></span></font><img alt="Mercado" hspace="10" src="images/mercado.gif"/></td>

里面沒有聖保羅文字。 然后，我在Chrome瀏覽器中按F12鍵找到原始源， spanLongDate <div>也沒有文本。

也許頁面已更新？

Answer 2

如果只想要日期，則應在其他地方查找。 如果您把湯倒了，然后搜索2012，那么您會在很多地方看到它。 使用以下代碼很容易使它脫離標題。

url = "http://www1.folha.uol.com.br/fsp/mercado/index-20121030.shtml"
page=urllib.request.urlopen(url)
soup = BeautifulSoup(page.read())
theDateTag = soup.find("title")
theDateString = theDateTag.get_text()
print(theDateString)

使用Python和BeautifulSoup搜尋范圍不會返回任何內容

問題描述

2 個解決方案

解決方案1
0 2015-09-09 03:24:57

解決方案2
0 2015-09-14 18:11:05

使用Python和BeautifulSoup搜尋范圍不會返回任何內容

問題描述

2 個解決方案

解決方案1 0 2015-09-09 03:24:57

解決方案2 0 2015-09-14 18:11:05

解決方案1
0 2015-09-09 03:24:57

解決方案2
0 2015-09-14 18:11:05