使用Beautiful Soup提取嵌套在多個標簽中的文本— Python

Question

我想從下面的html中提取文字“ Beautiful Soup”中的“ 12:25 AM-30 Mar 2015”。 這是經過BS處理后的html外觀：

<span class="u-floatLeft"> · </span>
<span class="u-floatLeft">
<a class="ProfileTweet-timestamp js-permalink js-nav js-tooltip" href="/TBantl/status/582333634931126272" title="5:08 PM - 29 Mar 2015">
<span class="js-short-timestamp " data-aria-label-part="last" data-long-form="true" data-time="1427674132">
Mar 29
  </span>

我有以下代碼，但它不起作用：

date = soup.find("a",attrs={"class":"ProfileTweet-timestamp js-permalink js-nav js-tooltip"})["title"]

Answer 1

這對我有用：

from bs4 import BeautifulSoup

html = """<span class="u-floatLeft">&nbsp;·&nbsp;</span>
          <span class="u-floatLeft">
          <a class="ProfileTweet-timestamp js-permalink js-nav js-tooltip" href="/indoz1/status/582443448927543296" title="12:25 AM - 30 Mar 2015">
          <span class="js-short-timestamp " data-aria-label-part="last" data-time="1427700314" data-long-form="true">
       """
soup = BeautifulSoup(html)
date = soup.find("a", attrs={"class": "ProfileTweet-timestamp js-permalink js-nav js-tooltip"})["title"]

>>> print(date)
'12:25 AM - 30 Mar 2015'

如果沒有更多信息，我懷疑您沒有將HTML代碼段轉換為BeautifulSoup對象。 在這種情況下，您將得到TypeError: find() takes no keyword arguments 。

或者，如alexce在上面的注釋中指出的那樣，您要查找的項目實際上可能不在您正在解析的HTML中。 在這種情況下， date將為空。

最后，與您上面遇到的問題完全無關-如果您接下來要將date解析為datetime對象，則有一種更簡單的方法。 只需從<span class="js-short-timestamp " ... >獲取"data-time"字段，然后使用datetime.datetime.fromtimestamp對其進行解析：

from datetime import datetime as dt

# get "data-time" field value as string named timestamp
data_time = dt.fromtimestamp(int(timestamp))

>>> print(data_time)
datetime.datetime(2015, 3, 30, 3, 25, 14)

使用Beautiful Soup提取嵌套在多個標簽中的文本— Python

問題描述

1 個解決方案

解決方案1
1 已采納 2015-03-31 18:13:58

使用Beautiful Soup提取嵌套在多個標簽中的文本— Python

問題描述

1 個解決方案

解決方案1 1 已采納 2015-03-31 18:13:58

解決方案1
1 已采納 2015-03-31 18:13:58