BeautifulSoup如何獲取跨度內的內容？

Question

我正在嘗試從設法解析“匹配”列的網站解析燈具的內容，但是在解析日期和時間列時遇到了困難。

我的程序

import re
import pytz
import requests
import datetime
from bs4 import BeautifulSoup
from espncricinfo.exceptions import MatchNotFoundError, NoScorecardError
from espncricinfo.match import Match

bigbash_article_link = "http://www.espncricinfo.com/ci/content/series/1128817.html?template=fixtures"

r = requests.get(bigbash_article_link)
bigbash_article_html = r.text

soup = BeautifulSoup(bigbash_article_html, "html.parser")


bigbash1_items = soup.find_all("span",{"class": "fixture_date"})
bigbash_items = soup.find_all("span",{"class": "play_team"})
bigbash_article_dict = {}
date_dict = {}

for div in bigbash_items:
    a = div.find('a')['href']
    bigbash_article_dict[div.find('a').string] = a
print(bigbash_article_dict)
for div in bigbash1_items:
    a = div.find('span').string
    date_dict[div.find('span').string] = a
print(date_dict)

執行此操作時，我得到print（bigbash_article_dict）輸出，但是print（date_dict）給我錯誤，我該如何解析日期和時間內容？

Answer 1

按照您的代碼，您想要在標簽范圍內獲取內容。 因此，您應該使用“ div.contents”來獲取span的內容。

您的問題應該是BeautifulSoup如何獲得跨度內的內容。

eg.
    div= <span class="fixture_date">
    Thu Feb 22
                            </span>
    div.contents[0].strip()= Thu Feb 22 
    ------------



for div in bigbash1_items:
        print("div=",div)    
        print("div.contents[0].strip()=",div.contents[0].strip(),"\r\n------------\r\n")

Answer 2

帶有fixture_date類的元素沒有<span> ，它們是跨度。 您可以直接從他們那里獲取數據。

所以代替這個：

div.find('span').string

你可以這樣做：

div.string

從網站的結構來看，這將返回奇數次迭代（1、3，..）的日期和偶數次迭代（2、4，..）的時間。

哦，我建議您使變量名有意義，因此將div重命名為span 。
因為在您的代碼中，所有div變量實際上都包含<span>標記;）

BeautifulSoup如何獲取跨度內的內容？

問題描述

2 個解決方案

解決方案1
1 已采納 2018-02-24 02:52:01

解決方案2
0 2018-02-24 02:54:37

BeautifulSoup如何獲取跨度內的內容？

問題描述

2 個解決方案

解決方案1 1 已采納 2018-02-24 02:52:01

解決方案2 0 2018-02-24 02:54:37

解決方案1
1 已采納 2018-02-24 02:52:01

解決方案2
0 2018-02-24 02:54:37