简体   繁体   中英

python web scraping from ebay

I'm trying to make a program to scrape title information of the first item on a laptop product list from amazon.com. I guess the last two lines of code have a problem catching the correct tag and attribute. Please tell me why the codes are not able to find the information and what your recommendation is. Thanks for reading.

import requests
import re
from bs4 import BeautifulSoup

url = "https://www.ebay.com/sch/i.html?_from=R40&_nkw=laptop&_sacat=0&_pgn=1"
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36"}
res = requests.get(url, headers=headers)
res.raise_for_status()
soup = BeautifulSoup(res.text, "lxml")

# print(res.text)
items = soup.find_all("div", attrs={"class":re.compile("^sg-col-inner")}) 
print(items[0].find("span", attrs={"class":"a-size-medium a-color-base a-text-normal"}).get_text()) # Error
# IndexError: list index out of range

The first tag with class="s-item" doesn't contain the <h3> tag (you will see it when you inspect the HTML structure of the page). You can use this example how to print titles of all search results:

import requests
from bs4 import BeautifulSoup

url = "https://www.ebay.com/sch/i.html?_from=R40&_nkw=laptop&_sacat=0&_pgn=1"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36"
}
res = requests.get(url, headers=headers)
res.raise_for_status()
soup = BeautifulSoup(res.text, "lxml")

for item in soup.select("#srp-river-results li.s-item"):
    print(item.h3.text)

Prints:

Lenovo ThinkPad T400 2 Duo P8400 3GB 160GB HDD 1280x800 WiFi DVD Windows 10 Pro
HP ProBook 655 G1 15.6" Laptop AMD CPU 2.5GHz 4GB 250GB Windows 10
Lenovo ThinkPad Yoga 370 Intel i5 8GB DDR4 512GB SSD 1920x1080 IPS Windows 10 Pro
Portatil acer extensa ex215-52-37y7 Core i3-1005g1 8gb ddr4 ssd 256gb FHD I

...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM