简体   繁体   English

无法抓取网址不变的网站页面 - python

[英]unable to scrape website pages with unchanged url - python

im trying to get the names of all games within this website "https://slotcatalog.com/en/The-Best-Slots#anchorFltrList".To do so im using the following code:我试图获取本网站“https://slotcatalog.com/en/The-Best-Slots#anchorFltrList”中所有游戏的名称。为此,我使用以下代码:

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}

url = "https://slotcatalog.com/en/The-Best-Slots#anchorFltrList"

page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')

data = []
table = soup.find_all('div', attrs={'class':'providerCard'})

for game in range(0,len(table)-1):
    print(table[game].find('a')['title'])

and i get what i want.我得到了我想要的。 I would like to replicate the same across all pages available on the website, but given that the url is not changing, I looked at the network (XMR) events on the page happening when clicking on a different page and I tried to send a request using the following code:我想在网站上可用的所有页面上复制相同的内容,但鉴于 url 没有改变,我查看了单击不同页面时页面上发生的网络 (XMR) 事件,并尝试发送请求使用以下代码:

for page_no in range(1, 100):
    data = {
            "blck":"fltrGamesBlk",
            "ajax":"1",
            "lang":"end",
            "p":str(page_no),
            "translit":"The-Best-Slots",
            "tag":"TOP",
            "dt1":"",
            "dt2":"",
            "sorting":"SRANK",
            "cISO":"GB",
            "dt_period":"",
            "rtp_1":"50.00",
            "rtp_2":"100.00",
            "max_exp_1":"2.00",
            "max_exp_2":"250000.00",
            "min_bet_1":"0.01",
            "min_bet_2":"5.00",
            "max_bet_1":"3.00",
            "max_bet_2":"10000.00"
        }
     page = requests.post('https://slotcatalog.com/index.php', 
                         data=data, 
                         headers={'Host' : 'slotcatalog.com',
                                  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:82.0) Gecko/20100101 Firefox/82.0'    
                })


    soup = BeautifulSoup(page.content, 'html.parser')
    for row in soup.find_all('div', attrs={'class':'providerCard'}):
        name = row.find('a')['title']
        print(name)
        

result : ("KeyError: 'title'") - meaning that its not finding the class "providerCard".结果 : ("KeyError: 'title'") - 意味着它没有找到类“providerCard”。 Has the request to the website been done in the wrong way?对网站的请求是否以错误的方式完成? If so, where should i change the code?如果是这样,我应该在哪里更改代码? thanks in advance提前致谢

Alright, so, you had a typo.好的,所以,你有一个错字。 XD It was this "lang":"end" from the payload but it should have been "lang": "en" , among other things. XD 这是来自有效载荷的"lang":"end"但它应该是"lang": "en"等等。

Anyhow, I've cleaned your code up a bit and it works as expected.无论如何,我已经清理了您的代码,它按预期工作。 You can keep looping for all the games, if you want.如果需要,您可以继续循环播放所有游戏。

import requests
from bs4 import BeautifulSoup

headers = {
    "referer": "https://slotcatalog.com/en/The-Best-Slots",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) "
                  "AppleWebKit/537.36 (KHTML, like Gecko) "
                  "Chrome/50.0.2661.102 Safari/537.36",
    "x-requested-with": "XMLHttpRequest",
}

payload = {
    "blck": "fltrGamesBlk",
    "ajax": "1",
    "lang": "en",
    "p": 1,
    "translit": "The-Best-Slots",
    "tag": "TOP",
    "dt1": "",
    "dt2": "",
    "sorting": "SRANK",
    "cISO": "EN",
    "dt_period": "",
    "rtp_1": "50.00",
    "rtp_2": "100.00",
    "max_exp_1": "2.00",
    "max_exp_2": "250000.00",
    "min_bet_1": "0.01",
    "min_bet_2": "5.00",
    "max_bet_1": "3.00",
    "max_bet_2": "10000.00"
}
page = requests.post(
    "https://slotcatalog.com/index.php",
    data=payload,
    headers=headers,
)
soup = BeautifulSoup(page.content, "html.parser")
print([i.get("title") for i in soup.find_all("a", {"class": "providerName"})])


Output (for page 1 only):输出(仅适用于第 1 页):

['Starburst', 'Bonanza', 'Rainbow Riches', 'Book of Dead', "Fishin' Frenzy", 'Wolf Gold', 'Twin Spin', 'Slingo Rainbow Riches', "Gonzo's Quest", "Gonzo's Quest Megaways", 'Eye of Horus (Reel Time Gaming)', 'Age of the Gods God of Storms', 'Lightning Roulette', 'Buffalo Blitz', "Fishin' Frenzy Megaways", 'Fluffy Favourites', 'Blue Wizard', 'Legacy of Dead', '9 Pots of Gold', 'Buffalo Blitz II', 'Cleopatra (IGT)', 'Quantum Roulette', 'Reel King Mega', 'Mega Moolah', '7s Deluxe', "Rainbow Riches Pick'n'Mix", "Shaman's Dream"]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM