简体   繁体   English

美丽的汤有些错误

[英]Beautiful soup some errors

So, I was making a web scrapper for amazon just for a personnel project but I am stuck with a problem which is whenever I use get_text it shows attribute error but it works perfectly fine in the video I was referring to I am not getting it.所以,我为亚马逊制作了一个 web 刮板,只是为了一个人事项目,但我遇到了一个问题,即每当我使用 get_text 时,它都会显示属性错误,但它在我所指的视频中工作得非常好,我没有得到它。 Before I didn't use the header thingy but then it made me think that it might have fault,So I copied it as it is what the instructor wrote into the video tutorial.在我没有使用 header 之前,它让我觉得它可能有问题,所以我复制了它,因为它是教练在视频教程中写的。

import requests
from bs4 import BeautifulSoup
URL="https://www.amazon.in/dp/B074WZJ4MF/ref=redir_mobile_desktop?_encoding=UTF8&aaxitk=8bc2212eee66e1c1bdca057df16f612f&hsa_cr_id=2722802130102&pd_rd_plhdr=t&pd_rd_r=135b3806-45ad-402d-9df7-0f14d458f874&pd_rd_w=19o2S&pd_rd_wg=TBmei&ref_=sbx_be_s_sparkle_mcd_asin_0_title"
HEADERS={"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"}

def getprice():
    page= requests.get(URL, headers=HEADERS)

    # print(htmlcontent)
    soup=BeautifulSoup(page.content,'html.parser')
    # print(soup.prettify)
    title=soup.find(id="productTitle").get_text()
    
    print(title)

if __name__=="__main__":
    getprice()

Here is the code: IDK why it's happening, let me show you the output too: The Output这是代码:IDK 为什么会这样,让我也向您展示 output: Output

The link is just a randomly taken link and the id taken is the Title of the product which I want it to display.该链接只是一个随机获取的链接,获取的 id 是我希望它显示的产品的标题。 Please help I searched whole internet for it.请帮助我在整个互联网上搜索它。

Your HEADERS variable is a dictionary.您的HEADERS变量是一个字典。 You should set correctly the User-Agent key.您应该正确设置 User-Agent 密钥。

HEADERS={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"}

If you are searching for easy way to get solution.如果您正在寻找简单的方法来获得解决方案。 you can scrape using selenium您可以使用 selenium 进行刮擦

Here is the code.这是代码。

driver= webdriver.Chrome("C:/chromedriver.exe")
url='https://....."
driver.get(url)
price= driver.find_element_by_xpath("//span[@class='a-price a-text-price a-size-m 
edium apexPriceToPay']").text

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM