简体   繁体   English

网页抓取 - 我需要一些帮助来理解如何区分页面 BS4 上的项目,请求

[英]Webscraping - I need some help understanding how to distinguish an item on a page BS4, Requests

I am stuck.我被困住了。 I am able to extract product name and prices from amazon, using the following code我可以使用以下代码从亚马逊提取产品名称和价格

import requests
from bs4 import BeautifulSoup
import pandas as pd


headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36"}
#
url = f'https://www.amazon.co.uk/dp/B083PHB6XX'

r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'lxml')

name = soup.find('span', {'id': 'productTitle'}).text.strip()
price = soup.find('span', {'id': 'priceblock_ourprice'}).text.strip()

print(name)
print(price)

But I am unable to figure out how to extract the sales rank data from within the table, which is lower down on the page, under the additional information section.但我无法弄清楚如何从附加信息部分下页面下方的表格中提取销售排名数据。 I'd be most grateful if anyone would be able to assist in helping me figure out how to write the next soup.find line of code, to show '106,505' for the sales rank.如果有人能够帮助我弄清楚如何编写下一个soup.find代码行,以显示销售排名的“106,505”,我将不胜感激。

Many thanks in advance.提前谢谢了。

One solution can be searching for <th> tag that contains string "Best Sellers Rank" and then find next <span> :一种解决方案是搜索包含字符串"Best Sellers Rank" <th>标签,然后找到下一个<span>

import requests
from bs4 import BeautifulSoup


url = "https://www.amazon.co.uk/dp/B083PHB6XX"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36"
}

soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
ranks = soup.select_one('th:-soup-contains("Best Sellers Rank")').find_next(
    "span"
)

print(ranks.text.split()[0])

Prints:印刷:

111,190

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM