簡體   English   中英

BeautifulSoup-從頁面中刮取多個表?

[英]BeautifulSoup - Scraping Multiple Tables from a page?

我正在嘗試從包含多個表的該URL中抓取內容。 所需的輸出將是:

NAME        FG% FT% 3PM REB AST STL BLK TO  PTS     SCORE
Team Jackson (0-8)      .4313   .7500   21  71  34  11  12  15  189     1-8-0
Team Keyrouze (4-4)     .4441   .8090   31  130 71  18  13  45  373     8-1-0
Nutz Vs. Draymond Green (4-4)       .4292   .8769   30  86  66  15  9   28  269     3-6-0
Team Pauls 2 da Wall (3-5)      .4784   .8438   40  123 64  18  20  30  316     6-3-0
Team Noey (2-6)     .4350   .7679   21  125 62  20  9   33  278     7-2-0
YOU REACH, I TEACH (2-5-1)      .4810   .7432   20  114 56  30  7   50  277     2-7-0
Kris Kaman His Pants (5-3)      .4328   .8000   20  74  59  20  5   27  238     3-6-0
Duke's Balls In Daniels Face (3-4-1)        .5000   .7045   42  139 38  27  22  30  303     6-3-0
Knicks Tape (5-3)       .5000   .8152   34  143 92  12  9   47  397     4-5-0
Suck MyDirk (5-3)       .4734   .8814   29  106 86  22  17  40  435     5-4-0
In Porzingod We Trust (4-4)     .4928   .7222   27  180 95  16  16  46  423     7-2-0
Team Aguilar (6-1-1)        .4718   .7053   28  177 65  12  35  48  413     2-7-0
Team Li (7-0-1)     .4714   .8118   35  134 74  17  17  47  368     6-3-0
Team Iannetta (4-4)     .4527   .7302   22  125 90  20  13  44  288     3-6-0

如果用這種方式格式化表格太困難了,我想知道如何抓取所有表格? 我刮所有行的代碼是這樣的:

tableStats = soup.find('table', {'class': 'tableBody'})
rows = tableStats.findAll('tr')

for row in rows:
    print(row.string)

但是它只顯示值“ TEAM”,什么也沒有顯示...為什么它不包含表中的所有行?

謝謝。

代替查找table標記,您應該直接使用更可靠的class (例如linescoreTeamRow 此代碼段可以解決問題,

from bs4 import BeautifulSoup
import requests
a = requests.get("http://games.espn.com/fba/scoreboard?leagueId=224165&seasonId=2017")
soup = BeautifulSoup(a.text, 'lxml')
# searching for the rows directly
rows = soup.findAll('tr', {'class': 'linescoreTeamRow'})
# you will need to isolate elements in the row for the table
for row in rows:
    print row.text

找到了一種精確獲取我在問題中指定的二維矩陣的方法。 它存儲為列表團隊

碼:

from bs4 import BeautifulSoup
import requests

source_code = requests.get("http://games.espn.com/fba/scoreboard?leagueId=224165&seasonId=2017")
plain_text = source_code.text
soup = BeautifulSoup(plain_text, 'lxml')
teams = []
rows = soup.findAll('tr', {'class': 'linescoreTeamRow'})

# Creates a 2-D matrix.
for row in range(len(rows)):
    team_row = []
    columns = rows[row].findAll('td')
    for column in columns:
        team_row.append(column.getText())
    print(team_row)
    # Add each team to a teams matrix.
    teams.append(team_row)

輸出:

['Team Jackson (0-10)', '', '.4510', '.8375', '41', '135', '101', '23', '11', '50', '384', '', '5-4-0']
['YOU REACH, I TEACH (3-6-1)', '', '.4684', '.7907', '22', '169', '103', '22', '10', '32', '342', '', '4-5-0']
['Nutz Vs. Draymond Green (4-6)', '', '.4552', '.8372', '30', '157', '68', '15', '16', '39', '356', '', '2-7-0']
["Jesse's  Blue Balls (4-5-1)", '', '.4609', '.7576', '47', '158', '71', '30', '20', '38', '333', '', '7-2-0']
['Team Noey (4-6)', '', '.4763', '.8261', '42', '164', '70', '25', '29', '44', '480', '', '5-4-0']
['Suck MyDirk (6-3-1)', '', '.4733', '.8403', '54', '160', '132', '23', '11', '47', '544', '', '4-5-0']
['Kris Kaman  His Pants (5-5)', '', '.4569', '.8732', '53', '138', '105', '27', '21', '53', '465', '', '6-3-0']
['Team Aguilar (6-3-1)', '', '.4433', '.7229', '40', '202', '68', '30', '22', '54', '452', '', '3-6-0']
['Knicks Tape (6-3-1)', '', '.4406', '.8824', '52', '172', '108', '24', '13', '49', '513', '', '6-3-0']
['Team Iannetta (4-6)', '', '.5321', '.6923', '24', '146', '94', '32', '16', '60', '428', '', '3-6-0']
['In Porzingod We Trust (6-4)', '', '.4694', '.6364', '37', '216', '133', '31', '21', '77', '468', '', '4-5-0']
['Team Keyrouze (6-4)', '', '.4705', '.8854', '51', '135', '108', '25', '17', '43', '550', '', '5-4-0']
['Team Li (8-1-1)', '', '.4369', '.8182', '57', '203', '130', '34', '22', '54', '525', '', '6-3-0']
['Team Pauls 2 da Wall (5-5)', '', '.4780', '.5970', '27', '141', '47', '19', '25', '28', '263', '', '3-6-0']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM