[英]Scraping with requests and BS4
我想獲取表格中的內容,然后將其放入以下網站中的熊貓數據框: https : //projects.fivethirtyeight.com/soccer-predictions/premier-league/
我對 BS 很陌生,但我相信我想要的是:
import requests
from bs4 import BeautifulSoup
r = requests.get(url = "https://projects.fivethirtyeight.com/soccer-predictions/ligue-1/")
soup = BeautifulSoup(r.text, "html.parser")
#print(soup.prettify())
print(soup.find("div", {"class":"forecast-table"}))
但當然,不幸的是,這是返回“無”。 任何幫助和指導都會很棒! 我相信我需要得到的東西在這里的某個地方(雖然不太確定):
<div id="forecast-table-wrapper">
<table class="forecast-table" id="forecast-table">
<thead>
<tr class="desktop">
<th class="top nosort">
</th>
<th class="top bordered-right rating nosort drop-6" colspan="3">
Team rating
</th>
<th class="top nosort rating2" colspan="1">
</th>
<th class="top bordered-right nosort drop-1" colspan="5">
avg. simulated season
</th>
<th class="top bordered-right nosort show-1 drop-3" colspan="2">
avg. simulated season
</th>
<th class="top bordered nosort" colspan="4">
end-of-season probabilities
</th>
</tr>
<tr class="sep">
<th colspan="11">
</th>
</tr>
由於您無論如何都在使用pandas
,因此您可以使用內置的表處理,如下所示:
pandas.read_html('https://projects.fivethirtyeight.com/soccer-predictions/premier-league/',
attrs = {
'class': 'forecast-table'
}, header = 1)
那是因為您正在搜索一個 div,但它是一個表,所以它應該是:
print(soup.find("table", {"class":"forecast-table"}))
import requests
from bs4 import BeautifulSoup
r = requests.get('https://projects.fivethirtyeight.com/soccer-predictions/ligue-1/')
soup = BeautifulSoup(r.content, 'html.parser')
table = soup.find_all('table', attrs={'class':'forecast-table'})
for i in table:
tr = i.find_all('tr')
for l in tr:
print(l.text)
輸出:
Team ratingavg. simulated seasonavg. simulated seasonend-of-season probabilities
teamspioff.def.WDLgoal diff.proj. pts.pts.relegatedrel.qualify for UCLmake UCLwin Ligue 1win league
PSG24 pts90.03.00.530.74.52.9+7897<1%>99%97%
Lyon14 pts76.32.10.719.69.19.3+2768<1%60%2%
Marseille13 pts71.12.00.918.38.311.4+1663<1%40%<1%
Lille19 pts63.71.70.916.78.612.6+9591%24%<1%
St Étienne15 pts62.71.60.914.710.912.4-1553%14%<1%
Montpellier16 pts64.01.50.713.912.411.7+2543%12%<1%
Nice11 pts62.01.60.913.510.014.5-7507%7%<1%
Monaco6 pts65.91.80.913.010.714.2+0508%7%<1%
Rennes8 pts63.41.60.813.010.514.5-3499%6%<1%
Bordeaux14 pts59.21.50.913.09.915.0-6498%5%<1%
Strasbourg12 pts59.21.51.012.610.814.6-2499%5%<1%
Angers11 pts60.41.50.912.610.215.2-54810%4%<1%
Toulouse13 pts58.21.50.911.912.014.1-104811%4%<1%
Dijon FCO10 pts57.71.61.112.28.517.3-124517%2%<1%
Caen10 pts55.61.41.010.812.414.8-104518%3%<1%
Nîmes10 pts54.91.51.110.711.615.6-134420%2%<1%
Reims10 pts55.31.30.910.312.315.4-144321%2%<1%
Nantes6 pts59.01.50.910.410.916.7-144225%1%<1%
Guingamp5 pts57.31.51.010.39.817.9-194130%<1%<1%
Amiens10 pts53.01.31.010.49.018.6-164031%<1%<1%
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.