簡體   English   中英

無法抓取網址不變的網站頁面 - python

[英]unable to scrape website pages with unchanged url - python

我試圖獲取本網站“https://slotcatalog.com/en/The-Best-Slots#anchorFltrList”中所有游戲的名稱。為此,我使用以下代碼:

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}

url = "https://slotcatalog.com/en/The-Best-Slots#anchorFltrList"

page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')

data = []
table = soup.find_all('div', attrs={'class':'providerCard'})

for game in range(0,len(table)-1):
    print(table[game].find('a')['title'])

我得到了我想要的。 我想在網站上可用的所有頁面上復制相同的內容,但鑒於 url 沒有改變,我查看了單擊不同頁面時頁面上發生的網絡 (XMR) 事件,並嘗試發送請求使用以下代碼:

for page_no in range(1, 100):
    data = {
            "blck":"fltrGamesBlk",
            "ajax":"1",
            "lang":"end",
            "p":str(page_no),
            "translit":"The-Best-Slots",
            "tag":"TOP",
            "dt1":"",
            "dt2":"",
            "sorting":"SRANK",
            "cISO":"GB",
            "dt_period":"",
            "rtp_1":"50.00",
            "rtp_2":"100.00",
            "max_exp_1":"2.00",
            "max_exp_2":"250000.00",
            "min_bet_1":"0.01",
            "min_bet_2":"5.00",
            "max_bet_1":"3.00",
            "max_bet_2":"10000.00"
        }
     page = requests.post('https://slotcatalog.com/index.php', 
                         data=data, 
                         headers={'Host' : 'slotcatalog.com',
                                  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:82.0) Gecko/20100101 Firefox/82.0'    
                })


    soup = BeautifulSoup(page.content, 'html.parser')
    for row in soup.find_all('div', attrs={'class':'providerCard'}):
        name = row.find('a')['title']
        print(name)
        

結果 : ("KeyError: 'title'") - 意味着它沒有找到類“providerCard”。 對網站的請求是否以錯誤的方式完成? 如果是這樣,我應該在哪里更改代碼? 提前致謝

好的,所以,你有一個錯字。 XD 這是來自有效載荷的"lang":"end"但它應該是"lang": "en"等等。

無論如何,我已經清理了您的代碼,它按預期工作。 如果需要,您可以繼續循環播放所有游戲。

import requests
from bs4 import BeautifulSoup

headers = {
    "referer": "https://slotcatalog.com/en/The-Best-Slots",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) "
                  "AppleWebKit/537.36 (KHTML, like Gecko) "
                  "Chrome/50.0.2661.102 Safari/537.36",
    "x-requested-with": "XMLHttpRequest",
}

payload = {
    "blck": "fltrGamesBlk",
    "ajax": "1",
    "lang": "en",
    "p": 1,
    "translit": "The-Best-Slots",
    "tag": "TOP",
    "dt1": "",
    "dt2": "",
    "sorting": "SRANK",
    "cISO": "EN",
    "dt_period": "",
    "rtp_1": "50.00",
    "rtp_2": "100.00",
    "max_exp_1": "2.00",
    "max_exp_2": "250000.00",
    "min_bet_1": "0.01",
    "min_bet_2": "5.00",
    "max_bet_1": "3.00",
    "max_bet_2": "10000.00"
}
page = requests.post(
    "https://slotcatalog.com/index.php",
    data=payload,
    headers=headers,
)
soup = BeautifulSoup(page.content, "html.parser")
print([i.get("title") for i in soup.find_all("a", {"class": "providerName"})])


輸出(僅適用於第 1 頁):

['Starburst', 'Bonanza', 'Rainbow Riches', 'Book of Dead', "Fishin' Frenzy", 'Wolf Gold', 'Twin Spin', 'Slingo Rainbow Riches', "Gonzo's Quest", "Gonzo's Quest Megaways", 'Eye of Horus (Reel Time Gaming)', 'Age of the Gods God of Storms', 'Lightning Roulette', 'Buffalo Blitz', "Fishin' Frenzy Megaways", 'Fluffy Favourites', 'Blue Wizard', 'Legacy of Dead', '9 Pots of Gold', 'Buffalo Blitz II', 'Cleopatra (IGT)', 'Quantum Roulette', 'Reel King Mega', 'Mega Moolah', '7s Deluxe', "Rainbow Riches Pick'n'Mix", "Shaman's Dream"]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM