繁体   English   中英

使用 python 和 selenium 访问页眉中的 javascript 对象数组

[英]Accessing a javascript array of objects in a page header with python and selenium

我正在构建一个机器人来从他们的商店页面检查各种 Ubiquiti Unifi 设备的库存(嘿,这些东西正在快速消失),我需要一些帮助。 我整天都在寻找这样的东西,但我在这里看到的东西都没有奏效。

我正在使用以下代码访问 UI.com 商店 ( https://store.ui.com/ )。 他们非常方便地在每个页面的标题中都有库存产品信息,我正在使用 selenium 获取主页并需要访问:

<script data-ot-ignore type="text/javascript">
  window.APP_DATA = {
    assets: {...},

    cart: {"note":null,"attributes":{"quantity-hdds":"{\"4446782390361\"=\u003e{\"0\"=\u003e{\"sku\"=\u003e\"HDD-1TB\", \"ratio\"=\u003e\"1\"}, \"1\"=\u003e{\"sku\"=\u003e\"HDD-8TB\", \"ratio\"=\u003e\"1\"}}}"},"original_total_price":0,"total_price":0,"total_discount":0,"total_weight":0.0,"item_count":0,"items":[],"requires_shipping":false,"currency":"USD","items_subtotal_price":0,"cart_level_discount_applications":[]},

    cartAccessories: [{
          "id": 4446782390361,
          "title": "Dream Machine Pro",
          "handle": "udm-pro",
          "url": "\/products\/udm-pro",
          "tags": ["#HDD-1TB","#HDD-8TB","ALT","ALT::udm-pro","bestseller","enhanced-wizard","featured","mx29","recommended","redirect-wizard","related","UI::1U","UI::AI","UI::Cloud Key","UI::HDD","UI::Network","UI::SFP+","UI::UniFi","unifi"],
          "featured_image": "//cdn.shopify.com/s/files/1/1439/1668/products/UDM-Pro_front-top-angle_53e97c87-61d9-4f3e-acad-6ba113bbf5de_small.png?v=1629983008",
          "variants": [{
                  "id": 32264307703897,
                  "title": "Default Title",
                  "price": 37900,
                  "sku": "UDM-Pro",
                  "available": true,
                  "inventory_empty":false,
                  "inventory_policy": "deny",
                  "image": "//cdn.shopify.com/s/files/1/1439/1668/products/UDM-Pro_front-top-angle_53e97c87-61d9-4f3e-acad-6ba113bbf5de_small.png?v=1629983008"
                },],
          "data":{"for":{"product-vendors":[],"product-types":["VoIP","Access","Surveillance"]},"type":"UDM-PRO","view":{"default":["UAP-nanoHD-US","UAP-FlexHD-US","UWB-XG-US","UAP-IW-HD-US","UAP-AC-HD-US","UAP-AC-M-US","UAP-BeaconHD-US","UAP-AC-PRO-US","UAP-AC-LITE-US","UAP-AC-LR-US","UAP-AC-IW-US","UAP-AC-M-PRO-US","UAP-AC-SHD-US","UAP-XG-US","UAP-AC-EDU-US","USW-48-POE","USW-24","USW-Pro-24","USW-48-BETA","USW-Lite-16-PoE-BETA","USW-LEAF-BETA","USW-16-PoE","USW-24-PoE","USW-Pro-48-PoE","USW-Pro-24-PoE","USW-Pro-48o","USW-Industrial","UVC-G4-DoorBell","UVC-G3-FLEX","UP-Sense-BETA","UP-Sense","*","!UT-ATA-BETA","!UT-Conference-BETA"],"checkout":["!UDM-Pro","!UDM-US","UVC-G4-DoorBell","UWB-XG-US","UAP-AC-HD-US","UAP-FlexHD-US","UAP-IW-HD-US","UAP-nanoHD-US","UAP-BeaconHD-US","UAP-XG-US","UAP-AC-SHD-US","UAP-AC-EDU-US","UAP-AC-M-PRO-US","UAP-AC-PRO-US","UAP-AC-LR-US","UAP-AC-M-US","UAP-AC-IW-US","UAP-AC-LITE-US","U6-IW-US-BETA","U6-Extender-US-BETA","U6-Lite-US-BETA"],"bundle":["!UDM-Pro","!UDM-US","UVC-G4-DoorBell","UWB-XG-US","UAP-AC-HD-US","UAP-FlexHD-US","UAP-IW-HD-US","UAP-nanoHD-US","UAP-BeaconHD-US","UAP-XG-US","UAP-AC-SHD-US","UAP-AC-EDU-US","UAP-AC-M-PRO-US","UAP-AC-PRO-US","UAP-AC-LR-US","UAP-AC-M-US","UAP-AC-IW-US","UAP-AC-LITE-US"]},"priority":1,"description":"","countries":[]}},{

现在,我对 javascript 没有太多经验,但看起来我感兴趣的数据基本上是另一个对象内的 javascript 对象数组? (“cartAccessories”的 [{}] 结构,一切都在其中。)源代码的元素检查给了我“/html/head/script[33]”作为脚本的 XPATH ......我想。似乎几乎每次都返回不同的数据。

我正在使用以下基本代码来获取页面:

from selenium import webdriver
from selenium.webdriver.common.by import By

import time

urlpage = 'https://store.ui.com/'
print(urlpage)
driver = webdriver.Firefox()

# get web page
driver.get(urlpage)
time.sleep(1)

print("Getting Results.")
results = driver.find_element(By.XPATH, "/html/head/script[33]")
html = results.get_attribute('innerHTML')
print(f"The results are: {html}")
driver.quit()

但这似乎不对。 我想将“cartAccessories”信息放入 python 列表中,以便我可以处理它。 访问此信息的最佳方式是什么? 我对这一切都错了吗?

您可以使用正则表达式来获取包含感兴趣数组的总体 JavaScript 对象,然后将其传递给 hjson 以处理未引用的键。 最后,提取cartAccessories项目并用它做你想做的事。

import requests, re, hjson

r = requests.get('https://store.ui.com/')
data =hjson.loads(re.search(r'window.APP_DATA = (.*?)<', r.text, re.S).group(1))
print(data['cartAccessories'])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM