网页抓取 - 我需要一些帮助来理解如何区分页面 BS4 上的项目，请求

Question

I am stuck.我被困住了。 I am able to extract product name and prices from amazon, using the following code我可以使用以下代码从亚马逊提取产品名称和价格

import requests
from bs4 import BeautifulSoup
import pandas as pd


headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36"}
#
url = f'https://www.amazon.co.uk/dp/B083PHB6XX'

r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'lxml')

name = soup.find('span', {'id': 'productTitle'}).text.strip()
price = soup.find('span', {'id': 'priceblock_ourprice'}).text.strip()

print(name)
print(price)

But I am unable to figure out how to extract the sales rank data from within the table, which is lower down on the page, under the additional information section.但我无法弄清楚如何从附加信息部分下页面下方的表格中提取销售排名数据。 I'd be most grateful if anyone would be able to assist in helping me figure out how to write the next soup.find line of code, to show '106,505' for the sales rank.如果有人能够帮助我弄清楚如何编写下一个soup.find代码行，以显示销售排名的“106,505”，我将不胜感激。

Many thanks in advance.提前谢谢了。

Answer 1

One solution can be searching for <th> tag that contains string "Best Sellers Rank" and then find next <span> :一种解决方案是搜索包含字符串"Best Sellers Rank" <th>标签，然后找到下一个<span> ：

import requests
from bs4 import BeautifulSoup


url = "https://www.amazon.co.uk/dp/B083PHB6XX"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36"
}

soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
ranks = soup.select_one('th:-soup-contains("Best Sellers Rank")').find_next(
    "span"
)

print(ranks.text.split()[0])

Prints:印刷：

111,190

网页抓取 - 我需要一些帮助来理解如何区分页面 BS4 上的项目，请求

问题描述

1 个解决方案

解决方案1
0 2021-10-12 17:30:48

网页抓取 - 我需要一些帮助来理解如何区分页面 BS4 上的项目，请求

问题描述

1 个解决方案

解决方案1 0 2021-10-12 17:30:48

解决方案1
0 2021-10-12 17:30:48