无法使用请求模块从静态网页中抓取信息

Question

我正在尝试使用 requests 模块从网页中获取product title及其description 。 标题和描述似乎是静态的，因为它们都存在于页面源代码中。 但是，我未能通过以下尝试抓住它们。 脚本此时抛出AttributeError 。

import requests
from bs4 import BeautifulSoup

link = 'https://www.nordstrom.com/s/anine-bing-womens-plaid-shirt/6638030'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36',
}

with requests.Session() as s:
    s.headers.update(headers)
    res = s.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    product_title = soup.select_one("h1[itemProp='name']").text
    product_desc = soup.select_one("#product-page-selling-statement").text
    print(product_title,product_desc)

如何使用请求模块从上面的页面中抓取标题和描述？

Answer 1

页面是动态的。 从 api 源获取数据：

import requests
import pandas as pd

api = 'https://www.nordstrom.com/api/ng-looks/styleId/6638030?customerId=f36cf526cfe94a72bfb710e5e155f9ba&limit=7'
jsonData = requests.get(api).json()

df = pd.json_normalize(jsonData['products'].values())

print(df.iloc[0])

输出：

id                                                       6638030-400
name                                  ANINE BING Women's Plaid Shirt
styleId                                                      6638030
styleNumber                                                         
colorCode                                                        400
colorName                                                       BLUE
brandLabelName                                            ANINE BING
hasFlatShot                                                     True
imageUrl           https://n.nordstrommedia.com/id/sr3/6d000f40-8...
price                                                        $149.00
pathAlias          anine-bing-womens-plaid-shirt/6638030?origin=c...
originalPrice                                                $149.00
productTypeLvl1                                                   12
productTypeLvl2                                                  216
isUmap                                                         False
Name: 0, dtype: object

Answer 2

当测试像这样的请求时，你应该输出响应来看看你得到了什么。 最好使用 Postman 之类的东西（我认为 VSCode 现在有类似的功能）来设置 URL、标头、方法和参数，并且还可以查看带有标头的完整响应。 当一切正常时，只需将其转换为 python 代码。 Postman 甚至有一些通用语言的“导出到代码”功能。

无论如何...

我在 Postman 上尝试了您的请求并得到了以下回复：