繁体   English   中英

Python/Selenium web 废弃如何从链接中找到隐藏的 src 值?

[英]Python/Selenium web scrap how to find hidden src value from a links?

抓取链接应该是一个简单的壮举,通常只是抓取 a 标签的src值。

我最近遇到了这个网站( https://sunteccity.com.sg/promotions ),其中找不到每个项目的标签的 href 值,但重定向仍然有效。 我试图找出一种方法来获取项目及其相应的链接。 我的典型 python selenium 代码看起来像这样

all_items = bot.find_elements_by_class_name('thumb-img')
for promo in all_items:
    a = promo.find_elements_by_tag_name("a")
    print("a[0]: ", a[0].get_attribute("href"))

但是,我似乎无法检索任何hrefonclick属性,我想知道这是否可能。 我注意到我无法右键单击,也无法在新选项卡中打开链接。

有没有办法获得所有这些项目的链接?

编辑:有什么方法可以检索页面上项目的所有链接?

IE

https://sunteccity.com.sg/promotions/724
https://sunteccity.com.sg/promotions/731
https://sunteccity.com.sg/promotions/751
https://sunteccity.com.sg/promotions/752
https://sunteccity.com.sg/promotions/754
https://sunteccity.com.sg/promotions/280
...

编辑:添加一个这样的锚标签的图像以获得更好的清晰度: 在此处输入图像描述

By reverse-engineering the Javascript that takes you to the promotions pages (seen in https://sunteccity.com.sg/_nuxt/d4b648f.js ) that gives you a way to get all the links, which are based on the HappeningID . 您可以通过在 JS 控制台中运行它来验证,这会为您提供第一个提升:

window.__NUXT__.state.Promotion.promotions[0].HappeningID

基于此,您可以创建一个 Python 循环来获取所有促销:

items = driver.execute_script("return window.__NUXT__.state.Promotion;")
for item in items["promotions"]:
    base = "https://sunteccity.com.sg/promotions/"
    happening_id = str(item["HappeningID"])
    print(base + happening_id)

这产生了以下 output:

https://sunteccity.com.sg/promotions/724
https://sunteccity.com.sg/promotions/731
https://sunteccity.com.sg/promotions/751
https://sunteccity.com.sg/promotions/752
https://sunteccity.com.sg/promotions/754
https://sunteccity.com.sg/promotions/280
https://sunteccity.com.sg/promotions/764
https://sunteccity.com.sg/promotions/766
https://sunteccity.com.sg/promotions/762
https://sunteccity.com.sg/promotions/767
https://sunteccity.com.sg/promotions/732
https://sunteccity.com.sg/promotions/733
https://sunteccity.com.sg/promotions/735
https://sunteccity.com.sg/promotions/736
https://sunteccity.com.sg/promotions/737
https://sunteccity.com.sg/promotions/738
https://sunteccity.com.sg/promotions/739
https://sunteccity.com.sg/promotions/740
https://sunteccity.com.sg/promotions/741
https://sunteccity.com.sg/promotions/742
https://sunteccity.com.sg/promotions/743
https://sunteccity.com.sg/promotions/744
https://sunteccity.com.sg/promotions/745
https://sunteccity.com.sg/promotions/746
https://sunteccity.com.sg/promotions/747
https://sunteccity.com.sg/promotions/748
https://sunteccity.com.sg/promotions/749
https://sunteccity.com.sg/promotions/750
https://sunteccity.com.sg/promotions/753
https://sunteccity.com.sg/promotions/755
https://sunteccity.com.sg/promotions/756
https://sunteccity.com.sg/promotions/757
https://sunteccity.com.sg/promotions/758
https://sunteccity.com.sg/promotions/759
https://sunteccity.com.sg/promotions/760
https://sunteccity.com.sg/promotions/761
https://sunteccity.com.sg/promotions/763
https://sunteccity.com.sg/promotions/765
https://sunteccity.com.sg/promotions/730
https://sunteccity.com.sg/promotions/734
https://sunteccity.com.sg/promotions/623

您使用了错误的定位器。 它给你带来了很多不相关的元素。
而不是find_elements_by_class_name('thumb-img')请尝试find_elements_by_css_selector('.collections-page.thumb-img')这样您的代码将是

all_items = bot.find_elements_by_css_selector('.collections-page .thumb-img')
for promo in all_items:
    a = promo.find_elements_by_tag_name("a")
    print("a[0]: ", a[0].get_attribute("href"))

您还可以通过.collections-page.thumb-img a定位器直接获取所需的链接,以便您的代码可以是:

links = bot.find_elements_by_css_selector('.collections-page .thumb-img a')
for link in links:
    print(link.get_attribute("href"))

<div class="thumb-img">字段的后代<img>标签没有hrefonclick属性,但有src属性。

要打印src属性的值,您需要为presence_of_all_elements_located()引入WebDriverWait ,您可以使用以下任一定位器策略

  • 使用CSS_SELECTOR

     driver.get("https://sunteccity.com.sg/promotions") print([my_elem.get_attribute("src") for my_elem in WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "ul.collections div.thumb-img>a>img")))])
  • 使用XPATH

     driver.get("https://sunteccity.com.sg/promotions") print([my_elem.get_attribute("src") for my_elem in WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.XPATH, "//ul[contains(@class, 'collections')]//div[@class='thumb-img']/a/img")))])
  • 注意:您必须添加以下导入:

     from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
  • 控制台 Output:

     ['https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/2753-0605_Marcom_New_StoresWebsite_LandingPage_06122021__1536x882.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/4748-0608_Marcom_CNY2022_Digital_FA_1536x882px_EATS-09.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/6775-Website-Promotion-1536%28w%29-x-882%28h%29_-_annchi_sac.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/4106-1536x882_-_Umistrong.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/8883-Woptics_Metaform_KV_360W_x_260H.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/320-TRU_LNY_campaign_Website_Promotion_1536x882px.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/9035-GintellCNY-Digital-Marketing_Singapore_1536x882_Rev-C.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/1605-Website_Promotion__Organic_Hair_Regrowth_Solutions.png', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/5125-website_image_-_PY.png', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/7462-Martiangear2._Website_Promotion_1536%28w%29_x_882%28h%29_%281%29.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/9576-BBQSuntec_WebsitePromo.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/7265-Nimisski_suntec_2_-_mandy_oh.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/4106-1536x882_-_Umistrong.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/4982-HLA_Website.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/2197-bh_cny_2022_%281536_x_882_px%29.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/7657-%281536x882%29_Hair_Plus_-_Suntec_City_Website_Promotion_-_Wee.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/8834-fz_cny_04_-_Sherman_Fu.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/2742-White_Restaurant_Website.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/2424-BWCJ_Chinese_New_Year_Special_Bundle_1536_X_882_no_text.png', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/2812-EYS_20-Dec-Hamper-1536x882-r1_-_Bok_kok_wai.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/6476-Superpark_20off_%281536_x_882_px%29_%282%29.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/921-TB_CNY_FieryFeastSet_1536x882px.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/6770-Recoil_Suntec_Website_Promotion_%281%29.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/1797-morganfield_website.jpeg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/273-1536x882_-_Ruth_NgTSB.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/8610-DEC-SuntecCity-CNY2022-TigerPlushToy-Banner-1536x882_-_Shiau_Chen_Lim.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/5460-SG_Scanteak_CNY2022_SUNTEC_DIGITALSCREEN-04_-_Scanteak_SG.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/3926-Singapore_min_tNew_Suntec_Web_Promo_1536x882px.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/4308-Suntec-CNY22-1536x882_-_Elements_Wellness_Group.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/5923-PetLoversSuntecCity-CNY22-1536x882-Dec21.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/5906-Myths_%26_Legends_Website.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/8873-FANCL_Suntec_LNY_visual_websitepromo.png', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/4584-Suntec-1536x882.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/2898-1536_882_low_res_-_Theresa_StateSwim.png', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/6775-Website-Promotion-1536%28w%29-x-882%28h%29_-_annchi_sac.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/3288-Suntec_Advertising_LNY_Website_Promotion_-_Ilina_Sim.png', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/1437-Hans_2022_CNY_-_SUN_1536x882px.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/6395-SC_Website_Highlights_1536x882px_ToTT_-_Ren_Qi_Quak.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/9201-Harvey_Norman_Electrical_%26_IT_lifestyle_V2_1536x882px.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/9017-EncikTanSuntec_City_-_Website_Highlights_%281536px_by_882px%29.png', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/4748-0608_Marcom_CNY2022_Digital_FA_1536x882px_EATS-09.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/3852-FINAL_Promo_listing_1536x882.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/2753-0605_Marcom_New_StoresWebsite_LandingPage_06122021__1536x882.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/1050-WebsiteHighlights1536x882.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/7312-TUES15_EATS_promolisting_01.jpg']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM