简体   繁体   English

使用beautifulsoup抓取动态网站

[英]Scraping Dynamic website using beautifulsoup

I am scraping a website nykaa.com and the link is ( https://www.nykaa.com/skin/moisturizers/serums-essence/c/8397?root=nav_3&page_no=1 ).我正在抓取一个网站 nykaa.com,链接是( https://www.nykaa.com/skin/moisturizers/serums-essence/c/8397?root=nav_3&page_no=1 )。 There are 25 pages and the data loads dynamically per page.有 25 页,每页动态加载数据。 I am unable to find the source of the data.我无法找到数据的来源。 Moreover when Scrape the data I am only able to get 20 products which become redundant and the list becomes 420 products.此外,当抓取数据时,我只能得到 20 个变得多余的产品,列表变成了 420 个产品。

import requests
from bs4 import BeautifulSoup
import unicodecsv as csv


urls = []
l1 = []


for page in range(1,5):
result = requests.get("https://www.nykaa.com/skin/moisturizers/serums-essence/c/8397?root=nav_3&page_no=" + str(page))
src = result.content

soup = BeautifulSoup(src,'lxml')

for div_tag in soup.find_all("div", class_ = "card-wrapper-container col-xs-12 col-sm-6 col-md-4"):
    for div1_tag in soup.find_all("div", class_ = "product-list-box card desktop-cart"):
        h2_tag = div1_tag.find("h2").find("span")
        price_tag = div1_tag.find("div", class_ = "price-info")
        l1 = [h2_tag.get_text(),price_tag.get_text()]
        urls.append(l1)

        #print(urls)


with open('xyz.csv', 'wb') as myfile:
     wr = csv.writer(myfile)
     wr.writerows(urls)

The above code fetches me a list of around 1200 product names and prices, out of which only 30 to 40 are unique otherwise all are redundant.上面的代码为我获取了大约 1200 个产品名称和价格的列表,其中只有 30 到 40 个是唯一的,否则都是多余的。 I want to fetch data of 25 pages uniquely and there are total 486 unique products.我想唯一地获取 25 页的数据,总共有 486 个唯一产品。 I also used selenium to click the next page link but that also didn't work out.我还使用硒来单击下一页链接,但这也没有奏效。

This shows making the request the page does (as viewed in network tab) in a loop over all pages (including determing number of pages).这显示在所有页面(包括确定页面数)的循环中发出页面执行的请求(在网络选项卡中查看)。 results is a list of lists you can easily write to csv. results是您可以轻松写入 csv 的列表列表。

import requests, math, csv

page = '1'

def append_new_rows(data):
    for i in data:
        if 'name' in i:
            results.append([i['name'], i['final_price']])

with requests.Session() as s:
    r = s.get(f'https://www.nykaa.com/gludo/products/list?pro=false&filter_format=v2&app_version=null&client=react&root=nav_3&page_no={page}&category_id=8397').json()
    results_per_page = 20
    total_results = r['response']['total_found']
    num_pages = math.ceil(total_results/results_per_page)
    results = []
    append_new_rows(r['response']['products'])

    for page in range(2, num_pages + 1):
        r = s.get(f'https://www.nykaa.com/gludo/products/list?pro=false&filter_format=v2&app_version=null&client=react&root=nav_3&page_no={page}&category_id=8397').json()
        append_new_rows(r['response']['products'])

with open("data.csv", "w", encoding="utf-8-sig", newline='') as csv_file:
    w = csv.writer(csv_file, delimiter = ",", quoting=csv.QUOTE_MINIMAL)
    w.writerow(['Name','Price'])
    for row in results:
        w.writerow(row)

You can use selenium :您可以使用selenium

from bs4 import BeautifulSoup as soup
from selenium import webdriver
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://www.nykaa.com/skin/moisturizers/serums-essence/c/8397')
def get_products(_d):
  return [{'title':(lambda x:x if not x else x.text)(i.find('div', {'class':'m-content__product-list__title'})), 'price':(lambda x:x if not x else x.text)(i.find('span', {'class':'post-card__content-price-offer'}))} for i in _d.find_all('div', {'class':'card-wrapper-container col-xs-12 col-sm-6 col-md-4'})]

s = soup(d.page_source, 'html.parser')
r = [list(filter(None, get_products(s)))]
while 'disable-event' not in s.find('li', {'class':'next'}).attrs['class']:
  d.get(f"https://www.nykaa.com{s.find('li', {'class':'next'}).a['href']}")
  s = soup(d.page_source, 'html.parser')
  r.append(list(filter(None, get_products(s))))

Sample output (first three pages):示例输出(前三页):

[[{'title': 'The Face Shop Calendula Essential Moisture Serum', 'price': '₹1320 '}, {'title': 'Palmers Cocoa Butter Formula Skin Perfecting Ultra Hydrating...', 'price': '₹970 '}, {'title': "Cheryl's Cosmeceuticals Clarifi Acne Anti Blemish Serum", 'price': '₹875 '}, {'title': 'Estee Lauder Advanced Night Repair Synchronized Recovery Com...', 'price': '₹1250 '}, {'title': 'Estee Lauder Advanced Night Repair Synchronized Recovery Com...', 'price': '₹1250 '}, {'title': 'Estee Lauder Advanced Night Repair Synchronized Recovery Com...', 'price': '₹3900 '}, {'title': 'Klairs Freshly Juiced Vitamin Drop', 'price': '₹1492 '}, {'title': 'Innisfree The Green Tea Seed Serum', 'price': '₹1950 '}, {'title': "Kiehl's Midnight Recovery Concentrate", 'price': '₹2100 '}, {'title': 'The Face Shop White Seed Brightening Serum', 'price': '₹1990 '}, {'title': 'Biotique Bio Dandelion Visibly Ageless Serum', 'price': '₹230 '}, {'title': None, 'price': None}, {'title': 'St.Botanica Vitamin C 20% + Vitamin E & Hyaluronic Acid Faci...', 'price': '₹1499 '}, {'title': 'Biotique Bio Coconut Whitening & Brightening Cream', 'price': '₹199 '}, {'title': 'Neutrogena Fine Fairness Brightening Serum', 'price': '₹849 '}, {'title': "Kiehl's Clearly Corrective Dark Spot Solution", 'price': '₹4300 '}, {'title': "Kiehl's Clearly Corrective Dark Spot Solution", 'price': '₹4300 '}, {'title': 'Lakme Absolute Perfect Radiance Skin Lightening Serum', 'price': '₹960 '}, {'title': 'St.Botanica Hyaluronic Acid + Vitamin C, E Facial Serum', 'price': '₹1499 '}, {'title': 'Jeva Vitamin C Serum with Hyaluronic Acid for Anti Aging and...', 'price': '₹350 '}, {'title': 'Lotus Professional Phyto-Rx Whitening & Brightening Serum', 'price': '₹595 '}], [{'title': 'The Face Shop Chia Seed Moisture Recharge Serum', 'price': '₹1890 '}, {'title': 'Lotus Herbals WhiteGlow Skin Whitening & Brightening Gel Cre...', 'price': '₹280 '}, {'title': 'Lakme 9 to 5 Naturale Aloe Aqua Gel', 'price': '₹200 '}, {'title': 'Estee Lauder Advanced Night Repair Synchronized Recovery Com...', 'price': '₹5900 '}, {'title': 'Mixify Unloc Skin Glow Serum', 'price': '₹499 '}, {'title': 'St.Botanica Retinol 2.5% + Vitamin E & Hyaluronic Acid Profe...', 'price': '₹1499 '}, {'title': 'LANEIGE Hydration Combo Set', 'price': '₹3000 '}, {'title': 'Biotique Bio Dandelion Ageless Visiblly Serum', 'price': '₹690 '}, {'title': 'The Moms Co. Natural Vita Rich Face Serum', 'price': '₹699 '}, {'title': "It's Skin Power 10 Formula VC Effector", 'price': '₹950 '}, {'title': "Kiehl's Powerful-Strength Line-Reducing Concentrate", 'price': '₹5100 '}, {'title': 'Olay Natural White Light Instant Glowing Fairness Skin Cream', 'price': '₹99 '}, {'title': 'Plum Green Tea Skin Clarifying Concentrate', 'price': '₹881 '}, {'title': 'Olay Total Effects 7 In One Anti-Ageing Smoothing Serum', 'price': '₹764 '}, {'title': 'Elizabeth Arden Ceramide Daily Youth Restoring Serum 60 Caps...', 'price': '₹5850 '}, {'title': None, 'price': None}, {'title': 'Olay Regenerist Advanced Anti-Ageing Micro-Sculpting Serum', 'price': '₹1699 '}, {'title': 'Lakme Absolute Argan Oil Radiance Overnight Oil-in-Serum', 'price': '₹945 '}, {'title': 'The Face Shop Mango Seed Silk Moisturizing Emulsion', 'price': '₹1890 '}, {'title': 'The Face Shop Calendula Essential Good to Glow Combo', 'price': '₹2557 '}, {'title': 'Garnier Skin Naturals Light Complete Serum Cream', 'price': '₹69 '}], [{'title': 'Clinique Moisture Surge Hydrating Supercharged Concentrate', 'price': '₹2550 '}, {'title': 'LANEIGE Sleeping Mask Combo', 'price': '₹3000 '}, {'title': 'Klairs Rich Moist Soothing Serum', 'price': '₹1492 '}, {'title': 'Estee Lauder Idealist Pore Minimizing Skin Refinisher', 'price': '₹5500 '}, {'title': 'O3+ Whitening & Brightening Serum', 'price': '₹1475 '}, {'title': 'Elizabeth Arden Ceramide Daily Youth Restoring Serum 90 Caps...', 'price': '₹6900 '}, {'title': 'Olay Natural White Light Instant Glowing Fairness Skin Cream', 'price': '₹189 '}, {'title': "L'Oreal Paris White Perfect Clinical Expert Anti-Spot Whiten...", 'price': '₹1480 '}, {'title': 'belif Travel Kit', 'price': '₹1499 '}, {'title': 'Forest Essentials Advanced Soundarya Serum With 24K Gold', 'price': '₹3975 '}, {'title': "L'Occitane Immortelle Reset Serum", 'price': '₹4500 '}, {'title': 'Lakme Absolute Skin Gloss Reflection Serum 30ml', 'price': '₹990 '}, {'title': 'Neutrogena Hydro Boost Emulsion', 'price': '₹999 '}, {'title': 'Innisfree Anti-Aging Set', 'price': '₹2350 '}, {'title': 'Clinique Fresh Pressed 7-Day System With Pure Vitamin C', 'price': '₹2400 '}, {'title': 'The Face Shop The Therapy Premier Serum', 'price': '₹2490 '}, {'title': 'The Body Shop Vitamin E Overnight Serum In Oil', 'price': '₹1695 '}, {'title': 'Jeva Vitamin C Serum with Hyaluronic Acid for Anti Aging and...', 'price': '₹525 '}, {'title': 'Olay Regenerist Micro Sculpting Cream and White Radiance Hyd...', 'price': '₹2698 '}, {'title': 'The Face Shop Yehwadam Pure Brightening Serum', 'price': '₹4350 '}]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM