简体   繁体   English

如何使用 selenium python 抓取产品的所有链接?

[英]How to scrape all links of products using selenium python?

There is a webpage and 42 products.有一个网页和42个产品。 I would like to get all links of 42 products to scrape individually them.我想获取 42 个产品的所有链接以单独抓取它们。 But When I try to get them, I am getting only 16-20 of them.但是当我试图得到它们时,我只得到了 16-20 个。 I used two approaches:我使用了两种方法:

  1. I got page source using Selenium then scraped with BeautifulSoup我使用 Selenium 获得页面源,然后用 BeautifulSoup 刮掉
  2. I only used selenium driver(css_selector, class_name) to get links.我只使用 selenium driver(css_selector, class_name) 来获取链接。 The link need to scrape: https://thrivecausemetics.com/collections/all?page=4&sort=ss_days_since_published%253Dasc链接需要刮: https://thrivecausemetics.com/collections/all?page=4&sort=ss_days_since_published%253Dasc

my 1st approach code:我的第一种方法代码:

driver = webdriver.Chrome()

webpage = "https://thrivecausemetics.com/collections/all?page=4&sort=ss_days_since_published%253Dasc"
driver.get(webpage)
time.sleep(15)

page_source = driver.page_source
soup = BeautifulSoup(page_source, 'lxml')
links = [link['href'] for link in soup.find("ul", class_="grid-list").find_all('a', class_='tile-images')]
print(links)
print(len(links))

my 2nd approach我的第二种方法

driver = webdriver.Chrome()
webpage = "https://thrivecausemetics.com/collections/all?page=4&sort=ss_days_since_published%253Dasc"
driver.get(webpage)
time.sleep(15)
ul_tag = driver.find_element(By.CSS_SELECTOR, "ul.grid-list")
print(ul_tag)
li_tags = ul_tag.find_elements(By.CSS_SELECTOR, "li.grid-item.is-visible")
# print(li_tags)
print(len(li_tags))

All two approaches are helping to get all links.这两种方法都有助于获得所有链接。 Using above codes, it is taking only 16 product links.使用上面的代码,它只需要 16 个产品链接。

Any help is appreciate任何帮助表示赞赏

Try this code:试试这个代码:

ul_tag = driver.find_elements(By.CSS_SELECTOR, ".grid-list.text-.align- .grid-item.is-visible .tile-heading-lockup a")
print("Total products: ", len(ul_tag))
for product_link in ul_tag:
    print("Product link: ", product_link.get_attribute("href"))

Output: Output:

Total products: 42产品总数:42

Product link:https://thrivecausemetics.com/products/brilliant-eye-brightener产品链接:https://thrivecausemetics.com/products/brilliant-eye-brightener

Product link: https://thrivecausemetics.com/products/liquid-lash-extensions-mascara产品链接: https://thrivecausemetics.com/products/liquid-lash-extensions-mascara

Product link: https://thrivecausemetics.com/products/waterproof-eyeliner产品链接: https://thrivecausemetics.com/products/waterproof-eyeliner

Product link: https://thrivecausemetics.com/products/sheer-strength-hydrating-lip-tint产品链接: https://thrivecausemetics.com/products/sheer-strength-hydrating-lip-tint

Product link: https://thrivecausemetics.com/products/infinity-waterproof-eyeshadow-stick产品链接: https://thrivecausemetics.com/products/infinity-waterproof-eyeshadow-stick

Product link: https://thrivecausemetics.com/products/triple-threat-color-stick产品链接: https://thrivecausemetics.com/products/triple-threat-color-stick

Product link: https://thrivecausemetics.com/products/infinity-waterproof-brow-liner产品链接: https://thrivecausemetics.com/products/infinity-waterproof-brow-liner

Product link: https://thrivecausemetics.com/products/instant-brow-fix-semi-permanent-eyebrow-gel产品链接: https://thrivecausemetics.com/products/instant-brow-fix-semi-permanent-eyebrow-gel

Product link: https://thrivecausemetics.com/products/liquid-lash-extensions-lash-serum产品链接: https://thrivecausemetics.com/products/liquid-lash-extensions-lash-serum

Product link: https://thrivecausemetics.com/products/buildable-blur-cc-cream-with-spf-35产品链接: https://thrivecausemetics.com/products/buildable-blur-cc-cream-with-spf-35

and so on...等等...

Follow this video for scrap all links按照这个视频报废所有链接

https://youtu.be/5qweI46pfyY https://youtu.be/5qweI46pfyY

That data is being pulled from an API endpoint by javascript, once the page loads, so requests cannot see it.一旦页面加载,该数据将由 javascript 从 API 端点提取,因此请求无法看到它。 The way forward is to scrape the actual API endpoint (you can find it in Dev tools - Network tab).前进的方法是抓取实际的 API 端点(您可以在开发工具 - 网络选项卡中找到它)。 Here is one way to obtain that data:这是获取该数据的一种方法:

import requests
import pandas as pd

url = 'https://b7i79y.a.searchspring.io/api/search/search.json?resultsFormat=native&page=1&resultsPerPage=500&sort.ss_days_since_published=asc&siteId=b7i79y'

r = requests.get(url)
df = pd.json_normalize(r.json()['results'])
print(df)

This will display in terminal:这将显示在终端中:

brand   collection_id   handle  id  imageUrl    intellisuggestData  intellisuggestSignature msrp    name    popularity  price   product_type_unigram    rating  ratingCount reviews_total_reviews   sku ss_available    ss_image_alt    ss_inventory_count  ss_name_type    tags    thumbnailImageUrl   uid url variant_id  variant_mfield_filter_color
0   Bigger Than Beauty Skincare [159254708314, 174020034650, 262184763482, 263320010842]    pumpkin-spice-latte-liquid-balm-treatment   bed045c1cec90548f830bfa4bc3e2e56    https://cdn.shopify.com/s/files/1/0582/2885/products/PSL_Component_1_medium.jpg?v=1662478574    eJxKMs80t6xkYGAICXM3NDZhMGQwZDBgMLdgSC_KTAEEAAD__1t7Bhw 5a3173ae3360eadabcc446e464c51a6269f0e28ab8d79b2be8b1da2b0f0201da    0   Pumpkin Spice Latte Liquid Balm Lip Treatment™  10669   26  treatment   4.45424 295 295 TVG134  1   https://cdn.shopify.com/s/files/1/0582/2885/products/PSL_Swatch_New_medium.jpg?v=1662478574 20060   lip treatment   [2261, 4522, 50, 800, Benefits:Hydrating, Benefits:Plumping, collection-badge::BACK IN STOCK!, collection::hide-variants, Face, Fill Size:< 1 fl oz, linked::liquid-balm-set, lip plumper, lip plumping, plump, plumper, plumping, recommendation::all-skincare, Skin Concern:Dull and Dry Skin, swatches::show, travel size, Vegan] https://cdn.shopify.com/s/files/1/0582/2885/products/PSL_Component_1_medium.jpg?v=1662478574    4742230212698   https://thrive-causemetics.myshopify.com/products/pumpkin-spice-latte-liquid-balm-treatment [32526428766298]    NaN
1   Thrive Causemetics  NaN dream-lash-duo  26b794e35fad33ba5496223db9f1bed4    https://cdn.shopify.com/s/files/1/0582/2885/products/Mascara_LashSerum_PDPSets_medium.jpg?v=1659461093  eJxKMs80t6xkYGAICXM3NDZhMGQwYjBgMLdgSC_KTAEEAAD__1uGBh0 12ef5b3a76c62cc8e9d2b0f6f2b2341a3903bbc584f3c347b96f6b9d67f38c05    0   Dream Lash Duo  NaN 71  duo NaN NaN NaN NaN 1   https://cdn.shopify.com/s/files/1/0582/2885/products/Mascara_LashSerum_PDPSets_nocopy_medium.jpg?v=1659491015   274 dream lash duo  [collection::hide-variants, YBlacklist] https://cdn.shopify.com/s/files/1/0582/2885/products/Mascara_LashSerum_PDPSets_medium.jpg?v=1659461093  6766529675354   https://thrive-causemetics.myshopify.com/products/dream-lash-duo    [40035119235162, 40035119267930, 40035119300698]    NaN
2   Thrive Causemetics  NaN liquid-lash-extensions-lash-serum   096bf1756363b494a31863ae20803818    https://cdn.shopify.com/s/files/1/0582/2885/products/LashSerum_Component_medium.jpg?v=1659566057    eJxKMs80t6xkYGAICXM3MrNgMGQwZjBgMLdgSC_KTAEEAAD__1wOBiY 5fa69eead6ac5da701c5be908298ba006e9490183a18de4e398e7599d0a01eb6    0   Liquid Lash Extensions™ Lash Serum  21949   56  serum   4.075   40  40  TVG268  1   NaN 75132   lash serum  [collection-badge::New!]    https://cdn.shopify.com/s/files/1/0582/2885/products/LashSerum_Component_medium.jpg?v=1659566057    6729553772634   https://thrive-causemetics.myshopify.com/products/liquid-lash-extensions-lash-serum [39909600854106]    NaN
3   Thrive Causemetics  [267668095066]  brilliant-face-highlighter-skin-perfecting-powder   9ff61df38853620f61d4c39e7363f5a2    https://cdn.shopify.com/s/files/1/0582/2885/products/Brilliant-Face-Highlighter_Component_ToQuyen_medium.jpg?v=1657292791   eJxKMs80t6xkYGAICXM3MjJnMGQwYTBgMLdgSC_KTAEEAAD__1vKBiI 7d651c91af12c272ce4478e268a2764530f5df50b7a4a78817eaeda1251cd85b    0   Brilliant Face Highlighter™ Skin Perfecting Powder  12525   34  highlighter 4.18182 66  66  TVG227  1   https://cdn.shopify.com/s/files/1/0582/2885/products/Brilliant-Face-Highlighter_Component_Shael_medium.jpg?v=1657292793 44920   highlighter [collection-badge::trending, Highlight, Highlighter, Highlighting]  https://cdn.shopify.com/s/files/1/0582/2885/products/Brilliant-Face-Highlighter_Component_ToQuyen_medium.jpg?v=1657292791   6729555247194   https://thrive-causemetics.myshopify.com/products/brilliant-face-highlighter-skin-perfecting-powder [39909605703770, 39909605736538, 39909605769306]    [gold]
4   Thrive Causemetics  NaN brilliant-face-set  dab6ca20bb4cf41740cacbbc37fb4f20    https://cdn.shopify.com/s/files/1/0582/2885/products/Highlighter_BEB_Primer_Set_PDP_medium.jpg?v=1657585503 eJxKMs80t6xkYGAICXM3MjJnMGQwZTBgMLdgSC_KTAEEAAD__1vVBiM c0e8eb4abe31f0bd31324450909de60b570dafb73a7e5f6f4c3cb49e93b7a9e4    0   Brilliant Face Set  NaN 84  sets    NaN NaN NaN NaN 1   https://cdn.shopify.com/s/files/1/0582/2885/products/Highlighter_BEB_Primer_Set_V2_medium.jpg?v=1657585503  3889    brilliant face sets [collection-badge::New!, collection::hide-variants, ST-unpublished] https://cdn.shopify.com/s/files/1/0582/2885/products/Highlighter_BEB_Primer_Set_PDP_medium.jpg?v=1657585503 6765261324378   https://thrive-causemetics.myshopify.com/products/brilliant-face-set    [40031327682650, 40031327715418, 40031327748186, 40031327780954, 40031327813722, 40031327846490, 40031327879258, 40031327912026, 40031327944794, 40031327977562, 40031328010330, 40031328043098, 40031328075866, 40031328108634, 40031328141402, 40031328174170, 40031328206938, 40031328239706, 40031328272474, 40031328305242, 40031328338010, 40031328370778, 40031328403546, 40031328436314, 40031328469082, 40031328501850, 40031328534618, 40031328567386, 40031328600154, 40031328632922, 40031328665690, 40031328698458, 40031328731226, 40031328763994, 40031328796762, 40031328829530, 40031328862298, 40031328895066, 40031328927834]    NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
73  Thrive Causemetics  [27186779, 209907910, 237619142, 333714566, 399044812, 2738815001, 4013293593, 5575639065, 5575770137, 5575802905, 6686801945, 56789368922, 56833736794, 57405112410, 81244520538, 82036260954, 82769608794, 83615547482, 84961689690, 85599420506, 86598778970, 87577591898, 88078483546, 89189417050, 89505661018, 89797230682, 91221393498, 91755741274, 93260021850, 96613498970, 149781872730, 153671794778, 157076848730, 159474384986, 166734495834, 173874675802, 262216908890, 262273564762, 263767261274, 263970160730, 264544747610, 264579055706, 264579121242, 265485779034, 266068197466, 266346889306, 266381164634, 266457251930, 266889232474, 267015094362, 267954094170] triple-threat-color-stick   908b72d51839441d48d27fc251340e88    https://cdn.shopify.com/s/files/1/0582/2885/products/TCCS_Triple_Threat_Color_Stick_Isabella_V2_2db06b39-24da-4e68-8029-dab1f68a985e_medium.jpg?v=1601483873    eJxKMs80t6xkYGAICXM3NDVhMGQwN2EwYDC3YEgvykwBBAAA__9h-AZY    67aceb71cc8a5bfa95da136235f5c18504688f10daed921caa8efe7e75dcfa8d    0   Triple Threat™ Color Stick  35732   36  threat  4.42416 3171    3171    TVG154  1   https://cdn.shopify.com/s/files/1/0582/2885/products/TCCS_Triple_Threat_Color_Stick_Mieko_V2_e92bcbce-3708-4941-a23b-5122e9881820_medium.jpg?v=1601483873   164632  triple threat   [Benefits:Hydrating, Benefits:Waterproof, Best Sellers, blush, body, collection-badge::Multi-Use!, Coverage:Buildable, Finish:Dewy, Finish:Shimmer, Formulation:Cream, intl::ca, Lips, Lipstick, recommendation::face, shade-finder::thumbnails, Triple Threat Color Stick, TVG285, TVG286, TVG287, Vegan, YCRF_cheeks] https://cdn.shopify.com/s/files/1/0582/2885/products/TCCS_Triple_Threat_Color_Stick_Isabella_V2_2db06b39-24da-4e68-8029-dab1f68a985e_medium.jpg?v=1601483873    5892103302  https://thrive-causemetics.myshopify.com/products/triple-threat-color-stick [32456620376154, 18635615622, 18635615558, 32456620310618, 32456620408922, 18635615430, 18635615686, 40078997586010, 40078998175834, 40078999191642]    [pink, gold, purple, red, peach]
74  Thrive Causemetics  [27186779, 209907910, 237619142, 343406086, 383763660, 2738815001, 4013293593, 6686801945, 6845464601, 57475530842, 81244487770, 81951588442, 82036260954, 82769608794, 83476283482, 83810091098, 86001451098, 86594289754, 86765207642, 88078483546, 91221393498, 93260021850, 93929963610, 94846025818, 149781872730, 151323705434, 157076848730, 159474384986, 162671919194, 166112591962, 263766736986, 264805384282, 266185965658, 267195973722]   infinity-waterproof-brow-liner  05a0b2ec067e0d40becc91a6d7ff10a9    https://cdn.shopify.com/s/files/1/0582/2885/products/BrowLiner_Component_Christina_medium.jpg?v=1637091941  eJxKMs80t6xkYGAICXM3MLRgMGQwN2UwYDC3YEgvykwBBAAA__9h7QZY    c25ffd3a46285a1f90da35783af2fb62d7900453d9fa7fbd7e288f8fd13b9f1d    0   Infinity Waterproof Eyebrow Liner™  39291   23  liner   4.49396 2235    2235    TVG018  1   https://cdn.shopify.com/s/files/1/0582/2885/products/BrowLiner_Component_Audrey_medium.jpg?v=1637091946 209279  brow liner  [Benefits:Waterproof, Coverage:Buildable, default_variant::2, Infinity Waterproof Brow Liner, Ingredients:Shea Butter, intl::ca, recommendation::eyes, shade-finder::thumbnails, Vegan, YCRF_eyes]  https://cdn.shopify.com/s/files/1/0582/2885/products/BrowLiner_Component_Christina_medium.jpg?v=1637091941  781737155   https://thrive-causemetics.myshopify.com/products/infinity-waterproof-brow-liner    [2199676227, 39591112081498, 2199676163, 35014122444, 39591112343642]   [beige, red, brown, black, grey]
75  Thrive Causemetics  [27186779, 91101891, 5576228889, 81244487770, 174020034650] gift-card   15c74aab8aa83d300f8c66cdca7c1cb1    https://cdn.shopify.com/s/files/1/0582/2885/products/egift-card_1__2_medium.png?v=1659654650    eJxKMs80t6xkYGAICXM3MLRgMGQwN2MwYDC3YEgvykwBBAAA__9h-AZZ    7ba842c650d93f11c8936dfaf70818667c3b3024b44fc6688ee25874d0ccf019    0   eGift Card  NaN 25  card    5   11  11  NaN 1   https://cdn.shopify.com/s/files/1/0582/2885/products/Thrive_PDP_GiftCard_medium.jpg?v=1659654650    -16399  gift card   [::hide-dropdown-swatch, collection::hide-variants, Gift Cards, image::no-swap, intl::ca, swag, YBlacklist] https://cdn.shopify.com/s/files/1/0582/2885/products/egift-card_1__2_medium.png?v=1659654650    337553443   https://thrive-causemetics.myshopify.com/products/gift-card [12622098246, 782092871, 12622102150, 782092875]    NaN
76  Thrive Causemetics  [27186779, 237619142, 343406086, 389141580, 81244520538, 91755741274, 153671794778, 157076848730, 159474384986] jackie  4f5bf8da32c6904051e29c44b49a4516    https://cdn.shopify.com/s/files/1/0582/2885/products/Jackie_Faux_Lashes_1_medium.jpg?v=1582596256   eJxKMs80t6xkYGAICXM3NDdiMGQwN2cwYDC3YEgvykwBBAAA__9iGwZb    3425b4b7cf441485cfbc7cc37da68ae40efbe65e842df16229f0c1c29b172b7d    0   Jackie Faux Lashes™ 150 26  lashes  4.85714 14  14  TVG172  1   https://cdn.shopify.com/s/files/1/0582/2885/products/PDP_lashes_jackie_1024x1024_1_medium.jpg?v=1582596246  827 faux lashes [Faux Lashes, recommendation::eyes, swatches::hide, Vegan, YCRF_eyes]   https://cdn.shopify.com/s/files/1/0582/2885/products/Jackie_Faux_Lashes_1_medium.jpg?v=1582596256   334825111   https://thrive-causemetics.myshopify.com/products/jackie    [775766255] NaN
77  Thrive Causemetics  [27186779, 237619142, 343406086, 389141580, 81244520538, 91755741274, 157076848730, 159474384986]   robin   cfe488b97e5e61b13c3060260a920885    https://cdn.shopify.com/s/files/1/0582/2885/products/Robin_Faux_Lashes_medium.jpg?v=1582233291  eJxKMs80t6xkYGAICXM3NDdmMGQwt2AwABHpRZkpgAAAAP__YjYGXQ  3b387d1b5edb7855f9d83403ddac5c5559ddac6a8ea440cde30447897896cfa6    0   Robin Faux Lashes™  130 26  lashes  4.9 10  10  TVG173  1   https://cdn.shopify.com/s/files/1/0582/2885/products/PDP_lashes_robin_1024x1024_7a28a8f4-b602-4049-9480-6eddb8e94944_medium.jpg?v=1582233282    2152    faux lashes [Faux Lashes, recommendation::eyes, swatches::hide, Vegan, YCRF_eyes]   https://cdn.shopify.com/s/files/1/0582/2885/products/Robin_Faux_Lashes_medium.jpg?v=1582233291  334825555   https://thrive-causemetics.myshopify.com/products/robin [775768027] NaN
78 rows × 26 columns

The actual XHR request is asking only for 12 products (and then continues to ask for more products, as you scroll the page).实际的 XHR 请求仅要求 12 种产品(然后在您滚动页面时继续要求更多产品)。 I went ahead and asked for 500 products (see url), to make sure I get them all.我继续要求 500 种产品(见 url),以确保我得到它们。

Requests documentation: https://requests.readthedocs.io/en/latest/请求文档: https://requests.readthedocs.io/en/latest/

Also, pandas relevant documentation:另外,pandas 相关文档:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 selenium 抓取链接和图像 - How to scrape links and images by using selenium 如何使用 Selenium webdriver 和 Python 抓取所有搜索结果 - How to scrape all the search results using Selenium webdriver and Python 如何使用 python/beautifulsoup/selenium 抓取弹出窗口中的所有数据? - How to scrape all data in the pop windows using python/beautifulsoup/selenium? 如何从 Google 搜索结果中抓取所有标题和链接(Python + Selenium) - How to scrape all the titles and links from Google search results (Python + Selenium) 使用 Python 中的 Selenium 按 class 名称单击并抓取“a href”链接 - Click and scrape 'a href' links by class name using Selenium in Python 尝试使用Selenium从网站上的所有产品生成链接 - Attempting to generate links from all products on website using Selenium Python 使用 Selenium 从页面上的多个链接中抓取数据 - Python Using Selenium to scrape data from multiple links on a page 如何使用 Python 仅抓取新链接(在上次抓取之后) - How to Scrape Only New Links (After Previous Scrape) Using Python 如何使用python和selenium IDE获取网页上的所有链接 - How to get all links on a web page using python and selenium IDE 如何使用 selenium、python 列出网站中的所有可点击链接? - How to list all clickable links in a website using selenium, python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM