简体   繁体   中英

Two almost identical codes, one works but the other doesn't

I dont know why the first code works but second doesnt. After "adidas" code i gets answers "connection aborted, OSError 10054". I'v heard something about API on websites, to be honest i dont know what is it but i fell thats related :D

IT WORKS:

import requests
from bs4 import BeautifulSoup

odpowiedz = requests.get("https://www.nike.com/pl/w?q=react%20270&vst=react%20270")
soup = BeautifulSoup(odpowiedz.text, 'html.parser')

IT DOESN'T WORK:

import requests
from bs4 import BeautifulSoup

odpowiedz = requests.get("https://www.adidas.pl/search?q=ultraboost")
soup = BeautifulSoup(odpowiedz.text, 'html.parser')

You can use selenium instead of requests to get the page source

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
driver.get("https://www.adidas.pl/search?q=ultraboost")
source = driver.page_source

soup = BeautifulSoup(source, 'html.parser')

If you want to exit chrome after you got the page source use driver.quit()

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
driver.get("https://www.adidas.pl/search?q=ultraboost")
source = driver.page_source
driver.quit()

soup = BeautifulSoup(source, 'html.parser')

If you don't want the chrome tab to appear

from selenium import webdriver
from bs4 import BeautifulSoup

options = webdriver.ChromeOptions()
options.add_argument('--headless')
driver = webdriver.Chrome(options=options)
driver.get("https://www.adidas.pl/search?q=ultraboost")
source = driver.page_source
driver.quit()

soup = BeautifulSoup(source, 'html.parser')

Daweo is right, the Adidas server checks the User-Agent header.

This works for me:

import requests
from bs4 import BeautifulSoup

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0",
           #"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
           #"Accept-Language": "en-US,en;q=0.5",
           }

odpowiedz = requests.get("https://www.adidas.pl/search?q=ultraboost", headers=headers)
soup = BeautifulSoup(odpowiedz.text, 'html.parser')

It even accepts "aaaaaaaaaaaaaadaaaMozilla" .

For Adidas.com, if you don't have an acceptable User-Agent , it returns a page explaining why:

During high-traffic product releases we have extra security in place to prevent bots entering our site. We do this to protect customers and to give everyone a fair chance of getting the sneakers. Something in your setup must have triggered our security system, so we cannot allow you onto the site.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM