簡體   English   中英

我在使用 bs4 進行 Web Scraping 的 for 循環中做錯了什么?

[英]What am I doing wrong in this for loop for Web Scraping with bs4?

我正在嘗試遍歷Transfermarkt上的玩家列表,輸入每個個人資料,獲取他們的個人資料圖片,然后抓取原始信息列表。 后者,我已經實現了(您將在我的代碼中看到),但前者我似乎無法開始工作。 我不是他的專家,並且在我的代碼方面得到了幫助。

我不想保存每個玩家圖片的源鏈接,而不是圖像本身,然后將該鏈接存儲到我的數據框中的“PlayerImgURL”中。 (第 73 行)。

這是我的錯誤信息:

(.venv) PS C:\Users\cljkn\Desktop\Python scraper github> & "c:/Users/cljkn/Desktop/Python scraper github/.venv/Scripts/python.exe" "c:/Users/cljkn/Desktop/Python scraper github/.vscode/test.py"
  File "c:/Users/cljkn/Desktop/Python scraper github/.vscode/test.py", line 45
    for page in range(1, 21):
    ^
SyntaxError: invalid syntax

謝謝。

from bs4 import BeautifulSoup
import requests
import pandas as pd

playerID = []
playerImage = []
playerName = []
result = []

for page in range(1, 21):

    r = requests.get("https://www.transfermarkt.com/spieler-statistik/wertvollstespieler/marktwertetop?land_id=0&ausrichtung=alle&spielerposition_id=alle&altersklasse=alle&jahrgang=0&kontinent_id=0&plus=1",
        params= {"page": page},
        headers= {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"}
    )
    soup = BeautifulSoup(r.content, "html.parser")

    links = soup.select('a.spielprofil_tooltip')

    for i in range(len(links)):
        playerID.append(links[i].get('id'))

    for i in range(len(playerID)):
        playerID[i] = 'https://www.transfermarkt.com/kylian-mbappe/profil/spieler/'+playerID[i]
        playerID = list(set(playerID))

    for i in range(len(playerID)):

        r = requests.get(playerID[i],
            params= {"page": page},
            headers= {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"}
        )
    soup = BeautifulSoup(r.content, "html.parser")

    name = soup.find_all('h1')

    for image in soup.find_all('img'):
        playerName.append('title')

        playerImage.append[image.get('src')




    for page in range(1, 21):

        r = requests.get("https://www.transfermarkt.com/spieler-statistik/wertvollstespieler/marktwertetop?land_id=0&ausrichtung=alle&spielerposition_id=alle&altersklasse=alle&jahrgang=0&kontinent_id=0&plus=1",
            params= {"page": page},
            headers= {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"}
        )
        soup = BeautifulSoup(r.content, "html.parser")


        tr = soup.find_all("tbody")[1].find_all("tr", recursive=False)

        result.extend([
            { 

            "Club": t[4].find("img")["alt"],
            "Age": t[2].text.strip(),
            "GamesPlayed": t[6].text.strip(),
            "GoalsDone": t[7].text.strip(),
            "OwnGoals": t[8].text.strip(),
            "Assists": t[9].text.strip(),
            "YellowCards": t[10].text.strip(),
            "SecondYellow": t[11].text.strip(),
            "StraightRed": t[12].text.strip(),
            "SubsOn": t[13].text.strip(),
            "SubsOff": t[14].text.strip(),
            "Nationality": t[3].find("img")["alt"], # for all nationality : [ i["alt"] for i in t[3].find_all("img")], 
            "Position": t[1].find_all("td")[2].text,
            "Value": t[5].text.strip(),
            #"PlayerImgURL":
            "ClubImgURL": t[4].find("img")["src"],
            "CountryImgURL": t[3].find("img")["src"] # for all country url: [ i["src"] for i in t[3].find_all("img")]
            }

            for t in (t.find_all(recursive=False) for t in tr)
        ])



df = pd.DataFrame(result,{'Name':playerImage, 'Source':playerImage})


#df.to_csv (r'S:\_ALL\Internal Projects\Introduction_2020\Transfermarkt\PlayerDetails.csv', index = False, header=True)

print(df)

這一行的問題

playerImage.append[image.get('src')

嘗試用這一行替換

playerImage.append(image.get('src'))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM