[英]For Loop to pass a Variable through a URL in Python
我是Python的新手,我想通過做一些簡單的網絡爬蟲來獲得足球統計數據來自己學習。
我一次成功獲取了一個頁面的數據,但是我仍然無法弄清楚如何在我的代碼中添加一個循環以一次抓取多個頁面(或與此相關的多個職位/年/會議) )。
我在該網站和其他網站上搜索了很多內容,但似乎無法正確理解。
這是我的代碼:
import csv
import requests
from BeautifulSoup import BeautifulSoup
url = 'http://www.nfl.com/stats/categorystats?seasonType=REG&d-447263-n=1&d-447263-o=2&d-447263-p=1&d-447263-s=PASSING_YARDS&tabSeq=0&season=2014&Submit=Go&experience=&archive=false&statisticCategory=PASSING&conference=null&qualified=false'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('table', attrs={'class': 'data-table1'})
list_of_rows = []
for row in table.findAll('tr'):
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text.replace(''', '')
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
#for line in list_of_rows: print ', '.join(line)
outfile = open("./2014.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Rk", "Player", "Team", "Pos", "Comp", "Att", "Pct", "Att/G", "Yds", "Avg", "Yds/G", "TD", "Int", "1st", "1st%", "Lng", "20+", "40+", "Sck", "Rate"])
writer.writerows(list_of_rows)
outfile.close()
這是我嘗試在URL中添加變量並建立循環的嘗試:
import csv
import requests
from BeautifulSoup import BeautifulSoup
pagelist = ["1", "2", "3"]
x = 0
while (x < 500):
url = "http://www.nfl.com/stats/categorystats?seasonType=REG&d-447263-n=1&d-447263-o=2&d-447263-p="+str(x)).read(),'html'+"&d-447263-s=RUSHING_ATTEMPTS_PER_GAME_AVG&tabSeq=0&season=2014&Submit=Go&experience=&archive=false&statisticCategory=RUSHING&conference=null&qualified=false"
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('table', attrs={'class': 'data-table1'})
list_of_rows = []
for row in table.findAll('tr'):
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text.replace(''', '')
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
#for line in list_of_rows: print ', '.join(line)
outfile = open("./2014.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Rk", "Player", "Team", "Pos", "Att", "Att/G", "Yds", "Avg", "Yds/G", "TD", "Long", "1st", "1st%", "20+", "40+", "FUM"])
writer.writerows(list_of_rows)
x = x + 0
outfile.close()
在此先感謝。
這是我修改后的代碼,似乎在將每個頁面寫入CSV文件時都將其刪除。
import csv
import requests
from BeautifulSoup import BeautifulSoup
url_template = 'http://www.nfl.com/stats/categorystats?tabSeq=0&season=2014&seasonType=REG&experience=&Submit=Go&archive=false&d-447263-p=%s&conference=null&statisticCategory=PASSING&qualified=false'
for p in ['1','2','3']:
url = url_template % p
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('table', attrs={'class': 'data-table1'})
list_of_rows = []
for row in table.findAll('tr'):
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text.replace(''', '')
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
#for line in list_of_rows: print ', '.join(line)
outfile = open("./2014Passing.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Rk", "Player", "Team", "Pos", "Comp", "Att", "Pct", "Att/G", "Yds", "Avg", "Yds/G", "TD", "Int", "1st", "1st%", "Lng", "20+", "40+", "Sck", "Rate"])
writer.writerows(list_of_rows)
outfile.close()
假設您只想更改頁碼,則可以執行以下操作並使用字符串格式 :
url_template = 'http://www.nfl.com/stats/categorystats?seasonType=REG&d-447263-n=1&d-447263-o=2&d-447263-p=%s&d-447263-s=PASSING_YARDS&tabSeq=0&season=2014&Submit=Go&experience=&archive=false&statisticCategory=PASSING&conference=null&qualified=false'
for page in [1,2,3]:
url = url_template % page
response = requests.get(url)
# Rest of the processing code can go here
outfile = open("./2014.csv", "ab")
writer = csv.writer(outfile)
writer.writerow(...)
writer.writerows(list_of_rows)
outfile.close()
請注意,您應該以追加模式(“ ab”)而不是寫入模式(“ wb”)打開文件,因為您會遇到這種情況,后者會覆蓋現有內容。 使用附加模式,新內容將寫入文件的末尾。
這不在問題的范圍之內,更不是一個友好的代碼改進建議,但是如果將腳本拆分成多個小功能,每個小功能都可以做一件事情,例如從站點獲取數據,將其寫入csv等。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.