简体   繁体   English

无法使用请求模块从 static 网页中抓取不同的管道工名称

[英]Unable to scrape different plumber names from a static webpage using requests module

I've been trying to scrape different plumber names from this webpage for the last couple of hours using requests module as the content of that site is static and is also available in page source (Ctrl + U).在过去的几个小时里,我一直在尝试使用 requests 模块从该网页上抓取不同的plumber names ,因为该网站的内容是 static 并且在页面源代码中也可用(Ctrl + U)。

However, when I run the script, I get the following error:但是,当我运行脚本时,出现以下错误:

raise TooManyRedirects('Exceeded {} redirects.'.format(self.max_redirects), response=resp)
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.

This is how I'm trying:这就是我正在尝试的方式:

from bs4 import BeautifulSoup
import requests
import urllib3

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

link = 'https://www.yellowpages.co.za/search?what=plumber&where=bryanston+west%2c+sandton%2c+gauteng&pg=2'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
    'Referer': 'https://www.yellowpages.co.za/search?what=plumber&where=bryanston+west%2c+sandton%2c+gauteng&pg=1',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9',
    'Host': 'www.yellowpages.co.za',
}

with requests.Session() as s:
    s.headers.update(headers)
    res = s.get(link,verify=False)
    soup = BeautifulSoup(res.text,"html.parser")
    for shop_name in soup.select("h5.nameOverflow"):
        print(shop_name.get_text(strip=True))

I tried to make it work with requests but at the end I've used standard python's urlopen() :我试图让它与requests一起工作,但最后我使用了标准 python 的urlopen()

import ssl
from bs4 import BeautifulSoup
from urllib.request import urlopen


ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

response = urlopen(
    "https://www.yellowpages.co.za/search?what=plumber&where=bryanston+west%2c+sandton%2c+gauteng&pg=2",
    context=ctx,
)

soup = BeautifulSoup(response, "html.parser")
for shop_name in soup.select("h5.nameOverflow"):
    print(shop_name.get_text(strip=True))

Prints:印刷:

Absolutely Fast Plumbing Co CC
Property Matters Gauteng (Pty) Ltd
Plumlite
Geyser Man
Daryn's Plumbing Services (Pty) Ltd
Electroc
Outek Engineers CC
Bryanston Plumbing (Pty) Ltd
Mage Plumbing & Electrical
Fourways Plumbing
Matrix Plumber
Angel Plumbers
Renovations And Maintenance Services
DCB Supplies
Call Us Plumbing
A B A Group
Action Plumbing
Clearline Plumbing Services
Capital Plumbing Supplies
AGD Plumbing

the reason why you're getting this error is because when you're accessing the website via the link provided it is being redirected quiet a few times.您收到此错误的原因是,当您通过提供的链接访问该网站时,它被重定向了几次。 Take a look at this thread , and let me know if that works for you看看这个线程,让我知道它是否适合你

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM