簡體   English   中英

無法使用 python 抓取網站

[英]Unable to scrape websites using python

我正在練習抓取網站,所以我選擇了一個網站https://www.dunzo.com/bangalore/nilgiris-supermarket-koramangala-ejipura

這是我正在使用的代碼

url="https://www.dunzo.com/bangalore/nilgiris-supermarket-koramangala-ejipura"
r=requests.get(url)
htmlcontnent=r.content 
soup=BeautifulSoup(htmlcontnent,'html.parser')
elem=soup.select('.hozIhp')
print(elem)

現在我得到 output 作為:

[<p class="sc-1gu8y64-0 dlNpIS sc-1twyv6b-1 hozIhp">Britannia Sweet Bread</p>, <p class="sc-1gu8y64-0 dlNpIS sc-1twyv6b-1 hozIhp">Britannia Sweet Bun</p>, <p class="sc-1gu8y64-0 dlNpIS sc-1twyv6b-1 hozIhp">Nilgiris Cheese Garlic Bread</p>, <p class="sc-1gu8y64-0 dlNpIS sc-1twyv6b-1 hozIhp">Nilgiris Fruit Bread</p>, <p class="sc-1gu8y64-0 dlNpIS sc-1twyv6b-1 hozIhp">Nilgiris Pav Bun</p>, <p class="sc-1gu8y64-0 dlNpIS sc-1twyv6b-1 hozIhp">Nilgiri's Broken Wheat Bread</p>, <p class="sc-1gu8y64-0 dlNpIS sc-1twyv6b-1 hozIhp">Nilgiri's Garlic Bread</p>, <p class="sc-1gu8y64-0 dlNpIS sc-1twyv6b-1 hozIhp">Nilgiri's Multi Grain Bread</p>, <p class="sc-1gu8y64-0 dlNpIS sc-1twyv6b-1 hozIhp">Nilgiri's Whole Wheat Bread</p>, <p class="sc-1gu8y64-0 dlNpIS sc-1twyv6b-1 hozIhp">Nilgiri's Whole Wheat Brown Bread</p>]

因此 output 以列表的形式出現。 現在我想提取項目名稱,例如 Britannia Sweet Bread、Britannia Sweet Bun、Nilgiris Cheese Garlic Bread 等。我嘗試了一些方法,例如 adding.text with soup 但它沒有用。 有人可以幫我怎么做嗎?

試試這個:

url="https://www.dunzo.com/bangalore/nilgiris-supermarket-koramangala-ejipura"
r=requests.get(url)
htmlcontnent=r.content 
soup=BeautifulSoup(htmlcontnent,'html.parser')
elem=soup.select('.hozIhp')
print(*[el.text for el in elem], sep="\n")

Output:

Britannia Sweet Bread
Britannia Sweet Bun
Nilgiris Cheese Garlic Bread
Nilgiris Fruit Bread
Nilgiris Pav Bun
Nilgiri's Broken Wheat Bread
Nilgiri's Garlic Bread
Nilgiri's Multi Grain Bread
Nilgiri's Whole Wheat Bread
Nilgiri's Whole Wheat Brown Bread
url="https://www.dunzo.com/bangalore/nilgiris-supermarket-koramangala-ejipura"
r=requests.get(url)
htmlcontnent=r.content 
soup=BeautifulSoup(htmlcontnent,'html.parser')
elem=soup.select('.hozIhp')
#add to your code
for item in elem:
    print(item.text)

您遇到的問題; 頁面正在動態加載, requests無法加載整個頁面

要解決此問題,您首先需要更多代碼,使用pip install seleniumhttps://chromedriver.chromium.org/downloads下載可壓縮的 Google Chrome 網絡驅動程序(您的計算機上必須安裝 Google chrome) web 驅動程序與 python 腳本位於同一文件夾中

然后運行這段代碼

from bs4 import BeautifulSoup
from selenium import webdriver
import time
 
browser = webdriver.Chrome(executable_path="chromedriver")

url="https://www.dunzo.com/bangalore/nilgiris-supermarket-koramangala-ejipura"
browser.get(url)

#the browser will scroll down for 7 times to load the remaining contents
for i in range (0,6):
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight)")
    #waits for 5 seconds the content to load(You can adjust value depending on your internet speed)
    time.sleep(5)

html = browser.page_source
#r=requests.get(url)
#htmlcontnent=r.content 
soup=BeautifulSoup(html,'html.parser')
elem=soup.select('.hozIhp')
for item in elem:
    print(item.text)
    
browser.close()

output

Britannia Sweet Bread
Britannia Sweet Bun
Nilgiris Cheese Garlic Bread
Nilgiris Fruit Bread
Nilgiris Pav Bun
Nilgiri's Broken Wheat Bread
Nilgiri's Garlic Bread
Nilgiri's Multi Grain Bread
Nilgiri's Whole Wheat Bread
Nilgiri's Whole Wheat Brown Bread
Nilgiri's Milk Bread
Nilgiri's Sandwich Bread
Bajaj White Eggs Gold Pack
Suguna Healthy Eggs
Eggs
Nandini - Shubham Pasteurized Standardized Milk
Nandini Good Life Slim Milk
Nilgiris Lite Milk
Nilgiris Double Toned Milk
Nilgiris Full Cream Milk
Nilgiri's Rich Milk
Amul Premium Dahi
Amul Cheese Slices A+
Cavin's Curd Pouch
Epigamia Mishti Doi
Id Natural Curd
Milky Mist Mango Yogurt
Nilgiris Curd Lite
Nilgiris Low Fat Probiotic Curd
Nilgiris Paneer
Nestle A+ Nourish Dahi
Nilgiris Natural Curd Set
Nilgiris Butter Milk
Nilgiri's Toned Milk Curd Pouch
Nilgiri's Lite Curd Pouch
Nilgiri's Malai Paneer
Soulfull Choco And Vanilla Fills - Ragi Bites
Soulfull Choco Fills - Ragi Bites
Soulfull Vanilla Fills - Ragi Bites
Soulfull Strawberry Fills - Ragi Bites
Soulfull Diet Millet Muesli
Soulfull Fruit & Nut Millet Muesli
Soulfull Crunchy Millet Muesli
Soulfull Baked Desi Muesli - Chatpata
Soulfull Baked Desi Muesli - Masala
Kellogg's Corn Flakes
Fortune Mini Soya Chunks
Kellogg's Chocos Moon And Stars
Soulfull Millet Smoothix - Cocoa Lite Protein Drink Sachets
Soulfull Millet Smoothix - Almond Protein Drink Sachets
Soulfull Millet Smoothix - Almond Protein Drink Sachets
Soulfull Millet Smoothix - Cocoa Lite Protein Drink Sachets

文檔中所述,您可以使用get_text()從文檔或標簽中提取文本

我正在練習抓取網站,所以我選擇了一個網站https://www.dunzo.com/bangalore/nilgiris-supermarket-koramangala-ejipura

這是我正在使用的代碼

url="https://www.dunzo.com/bangalore/nilgiris-supermarket-koramangala-ejipura"
r=requests.get(url)
htmlcontnent=r.content 
soup=BeautifulSoup(htmlcontnent,'html.parser')
elem=soup.select('.hozIhp')
print(elem)

現在我得到 output 為:

[<p class="sc-1gu8y64-0 dlNpIS sc-1twyv6b-1 hozIhp">Britannia Sweet Bread</p>, <p class="sc-1gu8y64-0 dlNpIS sc-1twyv6b-1 hozIhp">Britannia Sweet Bun</p>, <p class="sc-1gu8y64-0 dlNpIS sc-1twyv6b-1 hozIhp">Nilgiris Cheese Garlic Bread</p>, <p class="sc-1gu8y64-0 dlNpIS sc-1twyv6b-1 hozIhp">Nilgiris Fruit Bread</p>, <p class="sc-1gu8y64-0 dlNpIS sc-1twyv6b-1 hozIhp">Nilgiris Pav Bun</p>, <p class="sc-1gu8y64-0 dlNpIS sc-1twyv6b-1 hozIhp">Nilgiri's Broken Wheat Bread</p>, <p class="sc-1gu8y64-0 dlNpIS sc-1twyv6b-1 hozIhp">Nilgiri's Garlic Bread</p>, <p class="sc-1gu8y64-0 dlNpIS sc-1twyv6b-1 hozIhp">Nilgiri's Multi Grain Bread</p>, <p class="sc-1gu8y64-0 dlNpIS sc-1twyv6b-1 hozIhp">Nilgiri's Whole Wheat Bread</p>, <p class="sc-1gu8y64-0 dlNpIS sc-1twyv6b-1 hozIhp">Nilgiri's Whole Wheat Brown Bread</p>]

因此 output 以列表的形式出現。 現在我想提取項目名稱,例如 Britannia Sweet Bread、Britannia Sweet Bun、Nilgiris Cheese Garlic Bread 等。我嘗試了一些方法,例如在湯中添加.text,但沒有成功。 有人可以幫我怎么做嗎?

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM