简体   繁体   English

web 刮用美汤

[英]web scraping using beautiful soup

I'm using beautiful soup to scrape a site.我正在用漂亮的汤来一个网站。

Code:代码:

    from bs4 import BeautifulSoup as soup
    
    from urllib.request import urlopen as uReq
    my_url = 'https://www.bewakoof.com/biker-t-shirts'
    uClient = uReq(my_url)
    
    
    page_html = uClient.read()
    uClient.close()
    page_soup = soup(page_html, "html.parser")
    
    containers = page_soup.findAll("div", {"class": "productGrid"})
    
    print(len(containers))

I am getting below mentioned error.我收到下面提到的错误。

Error错误

o = containerClass(current_data)
TypeError: __init__() takes 1 positional argument but 2 were given

When I tryed to run part of yours code I've catch an error:当我尝试运行您的部分代码时,我发现了一个错误:

在此处输入图像描述

After that i've try to use requests:之后我尝试使用请求:

>>> my_url = 'https://www.bewakoof.com/biker-t-shirts'
>>> import requests as re
>>> r = re.get(my_url)
>>> r
<Response [403]>

You have got code 403 - it means that the server understood the request but refuses to authorize it.你有代码 403 - 这意味着服务器理解请求但拒绝授权它。 You can get more information about that here你可以在这里获得更多信息

Most often, this error is associated with primitive protection from parsers.大多数情况下,此错误与解析器的原始保护有关。 To solve this, use this method: You must use headers to deceive the site that you are a browser To do this download requests lib then create a dict要解决这个问题,请使用以下方法:您必须使用标deceive您是浏览器的站点为此下载请求库然后创建一个字典

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"}

Instead of these values you can substitute your own.您可以用自己的值代替这些值。 The easiest way to do this is with Network Analiser in your browser (press F12 in Chrome)最简单的方法是在浏览器中使用网络分析器(在 Chrome 中按 F12) 在此处输入图像描述

Then然后

import requests as req
url = "url"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"}
r = req.get(url, headers)

But in this situation, the problem is different.但在这种情况下,问题就不同了。 The site you are trying to access simply does not work:您尝试访问的网站根本不起作用: 在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM