[英]Accessing a javascript array of objects in a page header with python and selenium
我正在构建一个机器人来从他们的商店页面检查各种 Ubiquiti Unifi 设备的库存(嘿,这些东西正在快速消失),我需要一些帮助。 我整天都在寻找这样的东西,但我在这里看到的东西都没有奏效。
我正在使用以下代码访问 UI.com 商店 ( https://store.ui.com/ )。 他们非常方便地在每个页面的标题中都有库存产品信息,我正在使用 selenium 获取主页并需要访问:
<script data-ot-ignore type="text/javascript">
window.APP_DATA = {
assets: {...},
cart: {"note":null,"attributes":{"quantity-hdds":"{\"4446782390361\"=\u003e{\"0\"=\u003e{\"sku\"=\u003e\"HDD-1TB\", \"ratio\"=\u003e\"1\"}, \"1\"=\u003e{\"sku\"=\u003e\"HDD-8TB\", \"ratio\"=\u003e\"1\"}}}"},"original_total_price":0,"total_price":0,"total_discount":0,"total_weight":0.0,"item_count":0,"items":[],"requires_shipping":false,"currency":"USD","items_subtotal_price":0,"cart_level_discount_applications":[]},
cartAccessories: [{
"id": 4446782390361,
"title": "Dream Machine Pro",
"handle": "udm-pro",
"url": "\/products\/udm-pro",
"tags": ["#HDD-1TB","#HDD-8TB","ALT","ALT::udm-pro","bestseller","enhanced-wizard","featured","mx29","recommended","redirect-wizard","related","UI::1U","UI::AI","UI::Cloud Key","UI::HDD","UI::Network","UI::SFP+","UI::UniFi","unifi"],
"featured_image": "//cdn.shopify.com/s/files/1/1439/1668/products/UDM-Pro_front-top-angle_53e97c87-61d9-4f3e-acad-6ba113bbf5de_small.png?v=1629983008",
"variants": [{
"id": 32264307703897,
"title": "Default Title",
"price": 37900,
"sku": "UDM-Pro",
"available": true,
"inventory_empty":false,
"inventory_policy": "deny",
"image": "//cdn.shopify.com/s/files/1/1439/1668/products/UDM-Pro_front-top-angle_53e97c87-61d9-4f3e-acad-6ba113bbf5de_small.png?v=1629983008"
},],
"data":{"for":{"product-vendors":[],"product-types":["VoIP","Access","Surveillance"]},"type":"UDM-PRO","view":{"default":["UAP-nanoHD-US","UAP-FlexHD-US","UWB-XG-US","UAP-IW-HD-US","UAP-AC-HD-US","UAP-AC-M-US","UAP-BeaconHD-US","UAP-AC-PRO-US","UAP-AC-LITE-US","UAP-AC-LR-US","UAP-AC-IW-US","UAP-AC-M-PRO-US","UAP-AC-SHD-US","UAP-XG-US","UAP-AC-EDU-US","USW-48-POE","USW-24","USW-Pro-24","USW-48-BETA","USW-Lite-16-PoE-BETA","USW-LEAF-BETA","USW-16-PoE","USW-24-PoE","USW-Pro-48-PoE","USW-Pro-24-PoE","USW-Pro-48o","USW-Industrial","UVC-G4-DoorBell","UVC-G3-FLEX","UP-Sense-BETA","UP-Sense","*","!UT-ATA-BETA","!UT-Conference-BETA"],"checkout":["!UDM-Pro","!UDM-US","UVC-G4-DoorBell","UWB-XG-US","UAP-AC-HD-US","UAP-FlexHD-US","UAP-IW-HD-US","UAP-nanoHD-US","UAP-BeaconHD-US","UAP-XG-US","UAP-AC-SHD-US","UAP-AC-EDU-US","UAP-AC-M-PRO-US","UAP-AC-PRO-US","UAP-AC-LR-US","UAP-AC-M-US","UAP-AC-IW-US","UAP-AC-LITE-US","U6-IW-US-BETA","U6-Extender-US-BETA","U6-Lite-US-BETA"],"bundle":["!UDM-Pro","!UDM-US","UVC-G4-DoorBell","UWB-XG-US","UAP-AC-HD-US","UAP-FlexHD-US","UAP-IW-HD-US","UAP-nanoHD-US","UAP-BeaconHD-US","UAP-XG-US","UAP-AC-SHD-US","UAP-AC-EDU-US","UAP-AC-M-PRO-US","UAP-AC-PRO-US","UAP-AC-LR-US","UAP-AC-M-US","UAP-AC-IW-US","UAP-AC-LITE-US"]},"priority":1,"description":"","countries":[]}},{
现在,我对 javascript 没有太多经验,但看起来我感兴趣的数据基本上是另一个对象内的 javascript 对象数组? (“cartAccessories”的 [{}] 结构,一切都在其中。)源代码的元素检查给了我“/html/head/script[33]”作为脚本的 XPATH ......我想。似乎几乎每次都返回不同的数据。
我正在使用以下基本代码来获取页面:
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
urlpage = 'https://store.ui.com/'
print(urlpage)
driver = webdriver.Firefox()
# get web page
driver.get(urlpage)
time.sleep(1)
print("Getting Results.")
results = driver.find_element(By.XPATH, "/html/head/script[33]")
html = results.get_attribute('innerHTML')
print(f"The results are: {html}")
driver.quit()
但这似乎不对。 我想将“cartAccessories”信息放入 python 列表中,以便我可以处理它。 访问此信息的最佳方式是什么? 我对这一切都错了吗?
您可以使用正则表达式来获取包含感兴趣数组的总体 JavaScript 对象,然后将其传递给 hjson 以处理未引用的键。 最后,提取cartAccessories
项目并用它做你想做的事。
import requests, re, hjson
r = requests.get('https://store.ui.com/')
data =hjson.loads(re.search(r'window.APP_DATA = (.*?)<', r.text, re.S).group(1))
print(data['cartAccessories'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.