簡體   English   中英

刮低。com 使用 Selenium 和 BeautifulSoup 價格問題

[英]Scraping lowes.com using Selenium and BeautifulSoup Price Issue

我正在嘗試獲取 lowes.com 產品詳細信息,這是我要運行的腳本

from bs4 import BeautifulSoup
from webdriver_manager.chrome import ChromeDriverManager
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_experimental_option('prefs', {
    'geolocation': True
})

#driver = webdriver.Chrome(ChromeDriverManager().install(),options=chrome_options)
#driver.execute_cdp_cmd("Page.setGeolocationOverride", {
#    "latitude": 34.052235,
#    "longitude": -118.243683,
#    "accuracy": 98
#})
driver.get("https://www.lowes.com/pd/Therma-Tru-Benchmark-Doors-Craftsman-Simulated-Divided-Light-Right-Hand-Inswing-Ready-To-Paint-Fiberglass-Prehung-Entry-Door-with-Insulating-Core-Common-36-in-x-80-in-Actual-37-5-in-x-81-5-in/1000157897")
driver.execute_script("window.scrollTo(0,document.body.scrollHeight/5)")
time.sleep(1)
driver.execute_script("window.scrollTo(0,(document.body.scrollHeight/5)*2)")
time.sleep(1)
driver.execute_script("window.scrollTo(0,(document.body.scrollHeight/5)*3)")
time.sleep(1)
driver.execute_script("window.scrollTo(0,(document.body.scrollHeight/5)*4)")
time.sleep(1)
driver.execute_script("window.scrollTo(0,(document.body.scrollHeight/5)*5)")
time.sleep(1)
content = driver.page_source
soup = BeautifulSoup(content,'html.parser')
imgs = soup.findAll("img", attrs={"class":"met-epc-item"})
for img in imgs:
    print(img.get("src"))
print("Price: "+soup.find("span", attrs={"class":"aPrice large"}).text)
brand = soup.find("a", attrs={"class":"Link__LinkStyled-RC__sc-b3hjw8-0 bYfcYt"})
print("brand url: "+ brand.get("href"))
print("brand name: "+ brand.get("text"))
print("brand desc: "+soup.find("h1", attrs={"class":"style__HeaderStyle-PDP__y7vp5g-12 iMECxW"}).text)
driver.close()

當我嘗試執行此腳本時,價格元素導致此元素不存在的錯誤,當我查看使用 selenium 打開的 chrome 實例中的頁面時,我發現價格沒有出現,並且一個文本框詢問郵政編碼或城市或 state to show price and availability and when try to enter any zipcode or city or state nothing happened and when I try to refresh or enter any other URL in the lowes website it says access denied and to re enter any other URL in lowes need to open new使用 selemnium 的 chrome 實例。 有什么建議可以解決這個問題並正確刮掉產品嗎? 我還想說明,當我從我的普通瀏覽器 chrome 打開網站時,它會正確打開,顯示價格並且不會給我任何拒絕訪問,因為我們

您要查找的數據以 Json 格式嵌入頁面中:

import re
import json
import requests

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0"
}

url = "https://www.lowes.com/pd/Therma-Tru-Benchmark-Doors-Craftsman-Simulated-Divided-Light-Right-Hand-Inswing-Ready-To-Paint-Fiberglass-Prehung-Entry-Door-with-Insulating-Core-Common-36-in-x-80-in-Actual-37-5-in-x-81-5-in/1000157897"
t = requests.get(url, headers=headers).text

data = re.search(r"window\['__PRELOADED_STATE__'\] = (\{.*?\})<", t)
data = json.loads(data.group(1))

# uncomment to print all data:
# print(json.dumps(data, indent=4))

item_id = url.split("/")[-1]

print("Name:", data["productDetails"][item_id]["product"]["brand"])
print("Desc:", data["productDetails"][item_id]["product"]["description"])
print("Price:", data["productDetails"][item_id]["price"]["itemPrice"])

印刷:

Name: Therma-Tru Benchmark Doors
Desc: 36-in x 80-in Fiberglass Craftsman Right-Hand Inswing Ready to paint Unfinished Prehung Single Front Door with Brickmould
Price: 370

數據可通過將GET請求發送至:

https://www.lowes.com/pd/1000157897/productdetail/1674/Guest

您可以嘗試此解決方案來獲取沒有 Selenium 的數據。 (類似於@Andrej Kesely 的回答,但這里的 URL 不同)。

import requests


id_ = "1000157897"
url = "https://www.lowes.com/pd/{}/productdetail/1674/Guest".format(id_)
headers = {
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36"
}

response = requests.get(url, headers=headers).json()

print("Product price: $", response["productDetails"][_id]["price"]["itemPrice"] )
print("Product price:", response["productDetails"][_id]["product"]["brand"])
print("Product description:", response["productDetails"][_id]["product"]["description"])

Output:

Product price: $ 370
Product price: Therma-Tru Benchmark Doors
Product description: 36-in x 80-in Fiberglass Craftsman Right-Hand Inswing Ready to paint Unfinished Prehung Single Front Door with Brickmould

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM