简体   繁体   English

python web 从 ebay 抓取

[英]python web scraping from ebay

I'm trying to make a program to scrape title information of the first item on a laptop product list from amazon.com.我正在尝试制作一个程序来从 amazon.com 中抓取笔记本电脑产品列表中第一项的标题信息。 I guess the last two lines of code have a problem catching the correct tag and attribute.我猜最后两行代码在捕捉正确的标签和属性时有问题。 Please tell me why the codes are not able to find the information and what your recommendation is.请告诉我为什么代码无法找到信息以及您的建议是什么。 Thanks for reading.谢谢阅读。

import requests
import re
from bs4 import BeautifulSoup

url = "https://www.ebay.com/sch/i.html?_from=R40&_nkw=laptop&_sacat=0&_pgn=1"
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36"}
res = requests.get(url, headers=headers)
res.raise_for_status()
soup = BeautifulSoup(res.text, "lxml")

# print(res.text)
items = soup.find_all("div", attrs={"class":re.compile("^sg-col-inner")}) 
print(items[0].find("span", attrs={"class":"a-size-medium a-color-base a-text-normal"}).get_text()) # Error
# IndexError: list index out of range

The first tag with class="s-item" doesn't contain the <h3> tag (you will see it when you inspect the HTML structure of the page).第一个带有class="s-item"的标签不包含<h3>标签(当您检查页面的 HTML 结构时会看到它)。 You can use this example how to print titles of all search results:您可以使用此示例如何打印所有搜索结果的标题:

import requests
from bs4 import BeautifulSoup

url = "https://www.ebay.com/sch/i.html?_from=R40&_nkw=laptop&_sacat=0&_pgn=1"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36"
}
res = requests.get(url, headers=headers)
res.raise_for_status()
soup = BeautifulSoup(res.text, "lxml")

for item in soup.select("#srp-river-results li.s-item"):
    print(item.h3.text)

Prints:印刷:

Lenovo ThinkPad T400 2 Duo P8400 3GB 160GB HDD 1280x800 WiFi DVD Windows 10 Pro
HP ProBook 655 G1 15.6" Laptop AMD CPU 2.5GHz 4GB 250GB Windows 10
Lenovo ThinkPad Yoga 370 Intel i5 8GB DDR4 512GB SSD 1920x1080 IPS Windows 10 Pro
Portatil acer extensa ex215-52-37y7 Core i3-1005g1 8gb ddr4 ssd 256gb FHD I

...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM