简体   繁体   English

我在使用 beautifulsoup 进行 web 抓取时遇到了一些问题

[英]I am having some trouble with web scraping using beautifulsoup

when ever i try to extract text between tags using.text() it gives a blank screen with just [] as output当我尝试使用 .text() 在标签之间提取文本时,它会给出一个空白屏幕,其中只有 [] 作为 output

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.amazon.in/s?k=ssd&ref=nb_sb_noss")

soup = BeautifulSoup(page.content, "html.parser")

product = soup.find_all("h2",class_="a-link-normal a-text-normal")
results = soup.find_all("span",class_="a-offscreen")

print(product)

this is the output that i got:这是我得到的 output:

C:\Users\Kushal\Desktop\requests-tutorial>C:/Users/Kushal/AppData/Local/Programs/Python/Python37/python.exe c:/Users/Kushal/Desktop/requests-tutorial/scraper.py
[]

when i try listing everything with a for loop then, nothing shows up not even the empty square brackets当我尝试用 for 循环列出所有内容时,什么都没有显示,甚至没有空方括号

Based on your comment below.根据您在下面的评论。 I've modified the code to fetch all the product title on the said page along with the price details.我修改了代码以获取所述页面上的所有产品标题以及价格详细信息。

Mark as answer if it works, else comment for further analysis.如果有效,则标记为答案,否则评论以供进一步分析。

import requests
from bs4 import BeautifulSoup
import lxml


dataList = list()
headers = {
    "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5)",
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "accept-charset": "cp1254,ISO-8859-9,utf-8;q=0.7,*;q=0.3",
    "accept-encoding": "gzip,deflate,sdch",
    "accept-language": "tr,tr-TR,en-US,en;q=0.8",
} 

url = requests.get('https://www.amazon.in/s?k=ssd&ref=nb_sb_noss'.format(), headers=headers)

soup = BeautifulSoup(url.content, 'lxml')

title = soup.find_all('span', attrs={'class':'a-size-medium a-color-base a-text-normal'})
price = soup.find_all('span', attrs={'class':'a-offscreen'})


for product in zip(title,price):
    title,price=product
    title_proper=title.text.strip()
    price_proper=price.text.strip()
    print(title_proper,'-',price_proper)
    
         

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM