简体   繁体   中英

Python (BeautifulSoup) Only 1 result

I know there are similar questions to this one that are answered which I already tried applying and didn't fix my problem.

My problem is that on this website: http://books.toscrape.com/catalogue/page-1.html there are 20 prices and when I try to scrape the prices, I only get the first price but not other 19.

Here's the code

from bs4 import BeautifulSoup
import requests
URL = 'http://books.toscrape.com/catalogue/page-1.html'
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find_all("div", class_ = "col-sm-8 col-md-9")

for i in results :
    prices = i.find("p", class_ = "price_color")
    print(prices.text.strip())
    print()

You search items in wrong way.

There is only one div with col-sm-8 col-md-9 with many prices but your code expects many divs with single price in every div - and this makes problem.

Using find() you search single price in this div but you should use find_all to get all prices in this single div .

div = soup.find("div", class_="col-sm-8 col-md-9")

prices = div.find_all("p", class_="price_color")

for i in prices:
    print(i.text.strip())

You could even search directly prices

prices = soup.find_all("p", class_="price_color")

for i in prices:
    print(i.text.strip())

Minimal working example:

from bs4 import BeautifulSoup
import requests

url = 'http://books.toscrape.com/catalogue/page-1.html'
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

div = soup.find("div", class_="col-sm-8 col-md-9")

prices = soup.find_all("p", class_="price_color")

for i in prices:
    print(i.text.strip())

Using find() to search price could work only if you would first find all regions with single price - like article .

Every book is in separated article - so there are many articles and every article has single price (and single title, single image, etc.)

from bs4 import BeautifulSoup
import requests

url = 'http://books.toscrape.com/catalogue/page-1.html'
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

results = soup.find_all("article")

for i in results:
    title = i.find("h3")
    print('title:', title.text.strip())

    price = i.find("p", class_="price_color")
    print('price:', price.text.strip())

    print('---')

Result:

title: A Light in the ...
price: £51.77
---
title: Tipping the Velvet
price: £53.74
---
title: Soumission
price: £50.10
---
title: Sharp Objects
price: £47.82
---
title: Sapiens: A Brief History ...
price: £54.23
---
title: The Requiem Red
price: £22.65
---
title: The Dirty Little Secrets ...
price: £33.34
---
title: The Coming Woman: A ...
price: £17.93
---
title: The Boys in the ...
price: £22.60
---
title: The Black Maria
price: £52.15
---
title: Starving Hearts (Triangular Trade ...
price: £13.99
---
title: Shakespeare's Sonnets
price: £20.66
---
title: Set Me Free
price: £17.46
---
title: Scott Pilgrim's Precious Little ...
price: £52.29
---
title: Rip it Up and ...
price: £35.02
---
title: Our Band Could Be ...
price: £57.25
---
title: Olio
price: £23.88
---
title: Mesaerion: The Best Science ...
price: £37.59
---
title: Libertarianism for Beginners
price: £51.33
---
title: It's Only the Himalayas
price: £45.17
---

this code should work!

import requests
from bs4 import BeautifulSoup


URL = 'http://books.toscrape.com/catalogue/page-1.html'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')

list_of_books = soup.select(
    # using chrom selector
    '#default > div > div > div > div > section > div:nth-child(2) > ol > li'
)

for book in list_of_books:
    price = book.find('p', {'class': 'price_color'})
    print(price.text.strip())

i just used chorme selector this is a screenshot of it

you are using the find and find_all in the wrong places.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM