[英]Scraping Dynamic website using beautifulsoup
I am scraping a website nykaa.com and the link is ( https://www.nykaa.com/skin/moisturizers/serums-essence/c/8397?root=nav_3&page_no=1 ).我正在抓取一个网站 nykaa.com,链接是( https://www.nykaa.com/skin/moisturizers/serums-essence/c/8397?root=nav_3&page_no=1 )。 There are 25 pages and the data loads dynamically per page.
有 25 页,每页动态加载数据。 I am unable to find the source of the data.
我无法找到数据的来源。 Moreover when Scrape the data I am only able to get 20 products which become redundant and the list becomes 420 products.
此外,当抓取数据时,我只能得到 20 个变得多余的产品,列表变成了 420 个产品。
import requests
from bs4 import BeautifulSoup
import unicodecsv as csv
urls = []
l1 = []
for page in range(1,5):
result = requests.get("https://www.nykaa.com/skin/moisturizers/serums-essence/c/8397?root=nav_3&page_no=" + str(page))
src = result.content
soup = BeautifulSoup(src,'lxml')
for div_tag in soup.find_all("div", class_ = "card-wrapper-container col-xs-12 col-sm-6 col-md-4"):
for div1_tag in soup.find_all("div", class_ = "product-list-box card desktop-cart"):
h2_tag = div1_tag.find("h2").find("span")
price_tag = div1_tag.find("div", class_ = "price-info")
l1 = [h2_tag.get_text(),price_tag.get_text()]
urls.append(l1)
#print(urls)
with open('xyz.csv', 'wb') as myfile:
wr = csv.writer(myfile)
wr.writerows(urls)
The above code fetches me a list of around 1200 product names and prices, out of which only 30 to 40 are unique otherwise all are redundant.上面的代码为我获取了大约 1200 个产品名称和价格的列表,其中只有 30 到 40 个是唯一的,否则都是多余的。 I want to fetch data of 25 pages uniquely and there are total 486 unique products.
我想唯一地获取 25 页的数据,总共有 486 个唯一产品。 I also used selenium to click the next page link but that also didn't work out.
我还使用硒来单击下一页链接,但这也没有奏效。
This shows making the request the page does (as viewed in network tab) in a loop over all pages (including determing number of pages).这显示在所有页面(包括确定页面数)的循环中发出页面执行的请求(在网络选项卡中查看)。
results
is a list of lists you can easily write to csv. results
是您可以轻松写入 csv 的列表列表。
import requests, math, csv
page = '1'
def append_new_rows(data):
for i in data:
if 'name' in i:
results.append([i['name'], i['final_price']])
with requests.Session() as s:
r = s.get(f'https://www.nykaa.com/gludo/products/list?pro=false&filter_format=v2&app_version=null&client=react&root=nav_3&page_no={page}&category_id=8397').json()
results_per_page = 20
total_results = r['response']['total_found']
num_pages = math.ceil(total_results/results_per_page)
results = []
append_new_rows(r['response']['products'])
for page in range(2, num_pages + 1):
r = s.get(f'https://www.nykaa.com/gludo/products/list?pro=false&filter_format=v2&app_version=null&client=react&root=nav_3&page_no={page}&category_id=8397').json()
append_new_rows(r['response']['products'])
with open("data.csv", "w", encoding="utf-8-sig", newline='') as csv_file:
w = csv.writer(csv_file, delimiter = ",", quoting=csv.QUOTE_MINIMAL)
w.writerow(['Name','Price'])
for row in results:
w.writerow(row)
You can use selenium
:您可以使用
selenium
:
from bs4 import BeautifulSoup as soup
from selenium import webdriver
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://www.nykaa.com/skin/moisturizers/serums-essence/c/8397')
def get_products(_d):
return [{'title':(lambda x:x if not x else x.text)(i.find('div', {'class':'m-content__product-list__title'})), 'price':(lambda x:x if not x else x.text)(i.find('span', {'class':'post-card__content-price-offer'}))} for i in _d.find_all('div', {'class':'card-wrapper-container col-xs-12 col-sm-6 col-md-4'})]
s = soup(d.page_source, 'html.parser')
r = [list(filter(None, get_products(s)))]
while 'disable-event' not in s.find('li', {'class':'next'}).attrs['class']:
d.get(f"https://www.nykaa.com{s.find('li', {'class':'next'}).a['href']}")
s = soup(d.page_source, 'html.parser')
r.append(list(filter(None, get_products(s))))
Sample output (first three pages):示例输出(前三页):
[[{'title': 'The Face Shop Calendula Essential Moisture Serum', 'price': '₹1320 '}, {'title': 'Palmers Cocoa Butter Formula Skin Perfecting Ultra Hydrating...', 'price': '₹970 '}, {'title': "Cheryl's Cosmeceuticals Clarifi Acne Anti Blemish Serum", 'price': '₹875 '}, {'title': 'Estee Lauder Advanced Night Repair Synchronized Recovery Com...', 'price': '₹1250 '}, {'title': 'Estee Lauder Advanced Night Repair Synchronized Recovery Com...', 'price': '₹1250 '}, {'title': 'Estee Lauder Advanced Night Repair Synchronized Recovery Com...', 'price': '₹3900 '}, {'title': 'Klairs Freshly Juiced Vitamin Drop', 'price': '₹1492 '}, {'title': 'Innisfree The Green Tea Seed Serum', 'price': '₹1950 '}, {'title': "Kiehl's Midnight Recovery Concentrate", 'price': '₹2100 '}, {'title': 'The Face Shop White Seed Brightening Serum', 'price': '₹1990 '}, {'title': 'Biotique Bio Dandelion Visibly Ageless Serum', 'price': '₹230 '}, {'title': None, 'price': None}, {'title': 'St.Botanica Vitamin C 20% + Vitamin E & Hyaluronic Acid Faci...', 'price': '₹1499 '}, {'title': 'Biotique Bio Coconut Whitening & Brightening Cream', 'price': '₹199 '}, {'title': 'Neutrogena Fine Fairness Brightening Serum', 'price': '₹849 '}, {'title': "Kiehl's Clearly Corrective Dark Spot Solution", 'price': '₹4300 '}, {'title': "Kiehl's Clearly Corrective Dark Spot Solution", 'price': '₹4300 '}, {'title': 'Lakme Absolute Perfect Radiance Skin Lightening Serum', 'price': '₹960 '}, {'title': 'St.Botanica Hyaluronic Acid + Vitamin C, E Facial Serum', 'price': '₹1499 '}, {'title': 'Jeva Vitamin C Serum with Hyaluronic Acid for Anti Aging and...', 'price': '₹350 '}, {'title': 'Lotus Professional Phyto-Rx Whitening & Brightening Serum', 'price': '₹595 '}], [{'title': 'The Face Shop Chia Seed Moisture Recharge Serum', 'price': '₹1890 '}, {'title': 'Lotus Herbals WhiteGlow Skin Whitening & Brightening Gel Cre...', 'price': '₹280 '}, {'title': 'Lakme 9 to 5 Naturale Aloe Aqua Gel', 'price': '₹200 '}, {'title': 'Estee Lauder Advanced Night Repair Synchronized Recovery Com...', 'price': '₹5900 '}, {'title': 'Mixify Unloc Skin Glow Serum', 'price': '₹499 '}, {'title': 'St.Botanica Retinol 2.5% + Vitamin E & Hyaluronic Acid Profe...', 'price': '₹1499 '}, {'title': 'LANEIGE Hydration Combo Set', 'price': '₹3000 '}, {'title': 'Biotique Bio Dandelion Ageless Visiblly Serum', 'price': '₹690 '}, {'title': 'The Moms Co. Natural Vita Rich Face Serum', 'price': '₹699 '}, {'title': "It's Skin Power 10 Formula VC Effector", 'price': '₹950 '}, {'title': "Kiehl's Powerful-Strength Line-Reducing Concentrate", 'price': '₹5100 '}, {'title': 'Olay Natural White Light Instant Glowing Fairness Skin Cream', 'price': '₹99 '}, {'title': 'Plum Green Tea Skin Clarifying Concentrate', 'price': '₹881 '}, {'title': 'Olay Total Effects 7 In One Anti-Ageing Smoothing Serum', 'price': '₹764 '}, {'title': 'Elizabeth Arden Ceramide Daily Youth Restoring Serum 60 Caps...', 'price': '₹5850 '}, {'title': None, 'price': None}, {'title': 'Olay Regenerist Advanced Anti-Ageing Micro-Sculpting Serum', 'price': '₹1699 '}, {'title': 'Lakme Absolute Argan Oil Radiance Overnight Oil-in-Serum', 'price': '₹945 '}, {'title': 'The Face Shop Mango Seed Silk Moisturizing Emulsion', 'price': '₹1890 '}, {'title': 'The Face Shop Calendula Essential Good to Glow Combo', 'price': '₹2557 '}, {'title': 'Garnier Skin Naturals Light Complete Serum Cream', 'price': '₹69 '}], [{'title': 'Clinique Moisture Surge Hydrating Supercharged Concentrate', 'price': '₹2550 '}, {'title': 'LANEIGE Sleeping Mask Combo', 'price': '₹3000 '}, {'title': 'Klairs Rich Moist Soothing Serum', 'price': '₹1492 '}, {'title': 'Estee Lauder Idealist Pore Minimizing Skin Refinisher', 'price': '₹5500 '}, {'title': 'O3+ Whitening & Brightening Serum', 'price': '₹1475 '}, {'title': 'Elizabeth Arden Ceramide Daily Youth Restoring Serum 90 Caps...', 'price': '₹6900 '}, {'title': 'Olay Natural White Light Instant Glowing Fairness Skin Cream', 'price': '₹189 '}, {'title': "L'Oreal Paris White Perfect Clinical Expert Anti-Spot Whiten...", 'price': '₹1480 '}, {'title': 'belif Travel Kit', 'price': '₹1499 '}, {'title': 'Forest Essentials Advanced Soundarya Serum With 24K Gold', 'price': '₹3975 '}, {'title': "L'Occitane Immortelle Reset Serum", 'price': '₹4500 '}, {'title': 'Lakme Absolute Skin Gloss Reflection Serum 30ml', 'price': '₹990 '}, {'title': 'Neutrogena Hydro Boost Emulsion', 'price': '₹999 '}, {'title': 'Innisfree Anti-Aging Set', 'price': '₹2350 '}, {'title': 'Clinique Fresh Pressed 7-Day System With Pure Vitamin C', 'price': '₹2400 '}, {'title': 'The Face Shop The Therapy Premier Serum', 'price': '₹2490 '}, {'title': 'The Body Shop Vitamin E Overnight Serum In Oil', 'price': '₹1695 '}, {'title': 'Jeva Vitamin C Serum with Hyaluronic Acid for Anti Aging and...', 'price': '₹525 '}, {'title': 'Olay Regenerist Micro Sculpting Cream and White Radiance Hyd...', 'price': '₹2698 '}, {'title': 'The Face Shop Yehwadam Pure Brightening Serum', 'price': '₹4350 '}]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.