Python web 抓取特定 class 返回無

Question

我是編程新手，我一直在嘗試自學 web 棒球數據抓取的一些原則。 在以下示例中，我嘗試從 CBS Sports 中抓取與棒球比賽球隊對決、比賽時間和可能的投手相關的數據。 我沒有問題讓球隊對決和比賽時間出現，但可能的投手返回“無”。

from bs4 import BeautifulSoup as Soup
import requests
import pandas as pd
from pandas import DataFrame

matchups_response=requests.get('https://www.cbssports.com/mlb/schedule/',"lxml")

matchups_soup=Soup(matchups_response.text)

matchups_tables=matchups_soup.find_all('table')

#len(matchups_tables)

matchups_tables=matchups_tables[0]

rows=matchups_tables.find_all('tr')

first_data_row=rows[1]

first_data_row.find_all(True, {'class':['CellPlayerName--short']})

[str(x.string) for x in first_data_row.find_all(True, {'class':['CellPlayerName--short']})]

def parse_row(row): return [str(x.string) for x in row.find_all(True, {'class':['CellPlayerName--short']})]

list_of_parsed_rows=[parse_row(row) for row in rows[1:31]]

dfPitchers=DataFrame(list_of_parsed_rows)

print(dfPitchers)

這就是它返回的內容：

       0     1
0   None  None
1   None  None
2   None  None
3   None  None
4   None  None
5   None  None
6   None  None
7   None  None
8   None  None
9   None  None
10  None  None
11  None  None

當我使用類似的代碼並參考 {'class':['TeamName']}) 或 {'class':['CellGame']})] 我得到正確的 output：

               0              1
0     Washington        Houston
1         Boston     Pittsburgh
2      Minnesota      Tampa Bay
3   Philadelphia   N.Y. Yankees
4      Milwaukee      Cleveland
5     Cincinnati          Texas
6        Arizona      Chi. Cubs
7      San Diego  San Francisco
8    Kansas City        Seattle
9    L.A. Angels       Colorado
10     N.Y. Mets          Miami
11       Oakland   L.A. Dodgers

0   WAS 0, HOU 0 - 1st
1   BOS 0, PIT 0 - 1st
2              1:05 pm
3              1:05 pm
4              4:05 pm
5              4:05 pm
6              4:05 pm
7              4:05 pm
8              4:10 pm
9              4:10 pm
10             6:40 pm
11             9:05 pm

但是對於 {'class':['CellPlayerName--short']})] 它總是返回 None。 任何幫助，將不勝感激。 提前道歉，我是一個新手，但我已經搜索並搜索了這個並且找不到我可以工作的解決方案。 謝謝！

Answer 1

來自文檔如果標簽包含不止一件事，那么不清楚 what.string 應該指的是什么，所以.string 被定義為 None

而不是.string使用.text / .get_text()來獲取結果：

def parse_row(row): return [x.text for x in row.find_all(True, {'class':['CellPlayerName--short']})]

和 select 更具體，如果你在線想從<a>獲得價值：

def parse_row(row): return [x.a.text for x in row.find_all(True, {'class':['CellPlayerName--short']})]

Output

0	1個
J·韋蘭德	C.愛德華茲
M·凱勒	N·皮維塔
D.拉斯穆森	B.奧伯
C.施密特	A.諾拉
C. Quantrill	B.伍德拉夫
霍華德	R.聖馬丁
J·斯蒂爾	Z.戴維斯
C.羅東	M.克萊文傑
L·吉爾伯特	D. 林奇
A. Senzatela	J·蘇亞雷斯
P.洛佩茲	C.巴西特
T.貢索林	S. Manaea

Python web 抓取特定 class 返回無

問題描述

1 個解決方案

解決方案1
2 已采納 2022-03-29 17:38:50

Python web 抓取特定 class 返回無

問題描述

1 個解決方案

解決方案1 2 已采納 2022-03-29 17:38:50

解決方案1
2 已采納 2022-03-29 17:38:50