[英]Beautiful Soup Extracting Data After href (not url)
我是使用BeautifulSoup的新手,並且嘗試使用它從NHL.com獲取一些測試數據。 到目前為止,這是我的代碼,但是我很迷路...
這是我要從中提取數據的HTML代碼的片段:
<tr>
<td rowspan="1" colspan="1"> … </td>
<td style="text-align: left;" rowspan="1" colspan="1">
<a href="/ice/player.htm?id=8474564">
Steven Stamkos
</a>
</td>
<td style="text-align: center;" rowspan="1" colspan="1">
<a href="javascript:void(0);" rel="TBL" onclick="loadTeamSpotlight(jQuery(this));" style="border-bottom:1px dotted;">
TBL
</a>
</td>
<td style="text-align: center;" rowspan="1" colspan="1">
C
</td>
<td style="center" rowspan="1" colspan="1">
16
</td>
<td style="center" rowspan="1" colspan="1">
14
</td>
<td style="center" rowspan="1" colspan="1">
9
</td>
我想從整個頁面的這些字段中提取數據,因此大約有30個不同的表行。 到目前為止,這是我的Python代碼,我不確定該去哪里。
from bs4 import BeautifulSoup
import requests
r = requests.get("http://www.nhl.com/ice/playerstats.htm?fetchKey=20142ALLSASAll&viewName=summary&sort=points&pg=1")
data = r.text
t_data=[]
soup = BeautifulSoup(data)
table = soup.find('table', {'class': 'data stats'})
我知道不多,但我不知道該如何處理。 謝謝大家的幫助
編輯:我解決了問題,並希望這會在將來對任何人有幫助。 這是我的代碼:
from bs4 import BeautifulSoup
import requests
r = requests.get("http://www.nhl.com/ice/playerstats.htm?fetchKey=20142ALLSASAll&viewName=summary&sort=points&pg=1")
player=[]
team=[]
goals=[]
assists=[]
cells=[]
points=[]
i=0
data = r.text
soup = BeautifulSoup(data)
table = soup.find('table', {'class': 'data stats'})
row=[]
for rows in table.find_all('tr'):
cells=rows.find_all('td')
if(len(cells)==19):
player.append(cells[1].find(text=True))
team.append(cells[2].find(text=True))
goals.append(cells[5].find(text=True))
assists.append(cells[6].find(text=True))
points.append(cells[7].find(text=True))
print(player[i],team[i],goals[i],assists[i],points[i])
i=i+1
我只是想發布另一種方法,所以您不必使用6個不同的列表來存儲連接的數據。 此外,還有一種更短,更優雅的方式來獲取所有預期的行。
# getting data
#...
from bs4 import BeautifulSoup
from collections import namedtuple
soup = BeautifulSoup(data)
# thats where the data are collected
rows = list()
# named tuple to store the relevant data of one player
Player = namedtuple('Player', ['name', 'team', 'goals', 'assists', 'points'])
# getting every row of the tbody in the specified table
for tr in soup.select('table.data.stats tbody tr'):
# put text-contents of the row in a list
cellStrings = [cell.find(text = True) for cell in tr.findAll('td')]
# add it to the
rows.append(
Player(
name=cellStrings[1],
team=cellStrings[2],
goals=cellStrings[5],
assists=cellStrings[6],
points=cellStrings[7]
)
)
rows
看起來像那樣
[Player(name=u'Steven Stamkos', team=u'TBL', goals=u'14', assists=u'9', points=u'23'),
Player(name=u'Sidney Crosby', team=u'PIT', goals=u'8', assists=u'15', points=u'23'),
Player(name=u'Ryan Getzlaf', team=u'ANA', goals=u'10', assists=u'12', points=u'22'),
Player(name=u'Alexander Steen', team=u'STL', goals=u'14', assists=u'7', points=u'21'),
Player(name=u'Corey Perry', team=u'ANA', goals=u'11', assists=u'10', points=u'21'),
Player(name=u'Alex Ovechkin', team=u'WSH', goals=u'13', assists=u'7', points=u'20'),
....
這樣訪問
>>> rows[20].name
u'Bryan Little'
您沒有確切提到所需的數據,但是可以按照以下步驟進行操作:
from BeautifulSoup import BeautifulSoup
...
table = soup.find('table', {'class': 'data stats'})
rows = table.find('tr')
for row in rows:
cols = row.findAll('td')
for col in cols:
print col.text
link = col.find("a")
if link:
print link.get("href"), link.get("rel"), link.get("onclick"), link.text
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.