[英]What am I doing wrong in this for loop for Web Scraping with bs4?
我正在嘗試遍歷Transfermarkt上的玩家列表,輸入每個個人資料,獲取他們的個人資料圖片,然后抓取原始信息列表。 后者,我已經實現了(您將在我的代碼中看到),但前者我似乎無法開始工作。 我不是他的專家,並且在我的代碼方面得到了幫助。
我不想保存每個玩家圖片的源鏈接,而不是圖像本身,然后將該鏈接存儲到我的數據框中的“PlayerImgURL”中。 (第 73 行)。
這是我的錯誤信息:
(.venv) PS C:\Users\cljkn\Desktop\Python scraper github> & "c:/Users/cljkn/Desktop/Python scraper github/.venv/Scripts/python.exe" "c:/Users/cljkn/Desktop/Python scraper github/.vscode/test.py"
File "c:/Users/cljkn/Desktop/Python scraper github/.vscode/test.py", line 45
for page in range(1, 21):
^
SyntaxError: invalid syntax
謝謝。
from bs4 import BeautifulSoup
import requests
import pandas as pd
playerID = []
playerImage = []
playerName = []
result = []
for page in range(1, 21):
r = requests.get("https://www.transfermarkt.com/spieler-statistik/wertvollstespieler/marktwertetop?land_id=0&ausrichtung=alle&spielerposition_id=alle&altersklasse=alle&jahrgang=0&kontinent_id=0&plus=1",
params= {"page": page},
headers= {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"}
)
soup = BeautifulSoup(r.content, "html.parser")
links = soup.select('a.spielprofil_tooltip')
for i in range(len(links)):
playerID.append(links[i].get('id'))
for i in range(len(playerID)):
playerID[i] = 'https://www.transfermarkt.com/kylian-mbappe/profil/spieler/'+playerID[i]
playerID = list(set(playerID))
for i in range(len(playerID)):
r = requests.get(playerID[i],
params= {"page": page},
headers= {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"}
)
soup = BeautifulSoup(r.content, "html.parser")
name = soup.find_all('h1')
for image in soup.find_all('img'):
playerName.append('title')
playerImage.append[image.get('src')
for page in range(1, 21):
r = requests.get("https://www.transfermarkt.com/spieler-statistik/wertvollstespieler/marktwertetop?land_id=0&ausrichtung=alle&spielerposition_id=alle&altersklasse=alle&jahrgang=0&kontinent_id=0&plus=1",
params= {"page": page},
headers= {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"}
)
soup = BeautifulSoup(r.content, "html.parser")
tr = soup.find_all("tbody")[1].find_all("tr", recursive=False)
result.extend([
{
"Club": t[4].find("img")["alt"],
"Age": t[2].text.strip(),
"GamesPlayed": t[6].text.strip(),
"GoalsDone": t[7].text.strip(),
"OwnGoals": t[8].text.strip(),
"Assists": t[9].text.strip(),
"YellowCards": t[10].text.strip(),
"SecondYellow": t[11].text.strip(),
"StraightRed": t[12].text.strip(),
"SubsOn": t[13].text.strip(),
"SubsOff": t[14].text.strip(),
"Nationality": t[3].find("img")["alt"], # for all nationality : [ i["alt"] for i in t[3].find_all("img")],
"Position": t[1].find_all("td")[2].text,
"Value": t[5].text.strip(),
#"PlayerImgURL":
"ClubImgURL": t[4].find("img")["src"],
"CountryImgURL": t[3].find("img")["src"] # for all country url: [ i["src"] for i in t[3].find_all("img")]
}
for t in (t.find_all(recursive=False) for t in tr)
])
df = pd.DataFrame(result,{'Name':playerImage, 'Source':playerImage})
#df.to_csv (r'S:\_ALL\Internal Projects\Introduction_2020\Transfermarkt\PlayerDetails.csv', index = False, header=True)
print(df)
這一行的問題
playerImage.append[image.get('src')
嘗試用這一行替換
playerImage.append(image.get('src'))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.