简体   繁体   中英

Python - Beautiful Soup Select only returning []

I am currently learning from a Python tutorial from Udemy (total newbie to Python). I am currently at a Beautiful Soup section where we are busy with an exercise to scrape the price off the author's book on Amazon. My code is below:

import bs4, requests
url = 'https://www.amazon.com/Automate-Boring-Stuff-Python-Programming/dp/1593275994/'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

response = requests.get(url, headers=headers)
response.raise_for_status()
soup = bs4.BeautifulSoup(response.text, 'html.parser')
soup.select('#addToCart > a > h5 > div > div.a-column.a-span4.a-text-right.a-span-last > span.a-size-medium.a-color-price.header-price')

When I inspect the path of the element of the price, I can see this:

<span class="a-size-medium a-color-price header-price"> 


            $25.45



    </span>

However when I copy and paste it by the soup.select and run the python command, I am only returned with a [] ie 2 square brackets. I should be getting the contents of the second code box.

UPDATE: During the period of which I was typing the question, it did display the result correctly, the contents of the box with $25.45, but 5 minutes later it went back to getting the result of the [] brackets only. I am behind a proxy, and have tried without going through a proxy, with no change in results. I dont get any error either when doing response.raise_for_status() . Please can some one assist?

(Remember that I don't intend to screen scrape any commercial site out there, I would very much like to apply my learnings to in-house scenarios)

Thank you!

You are over-complicating your CSS selector and making it fragile - heavily dependent on the page layout. You don't have to go through the complete parent-child chain to locate an element. Choose the most reliable, readable and appropriate points you can base your locator on. For instance, in this case, the following works for me:

soup.select('#addToCart .header-price')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM