简体   繁体   中英

soup.select returns empty list. Need help finding the source of the css code

I am currently going through the 'Automate the Boring Stuff' Udemy Course, lesson '40. Parsing HTML with the Beautiful Soup Module'. About minutes in, Al uses requests the html of an amazon page and uses soup.select with the prices selector in order to print it out. I am currently trying to that with the exact same code, except for the usage of headers with seems to be necessary, otherwise i get a server error. I have read through some similar questions and the general solution seems to be to find the source for the data using the network panel. Unfortunately i have no clue on how to do that:/

import requests
import bs4
headers = {'User-Agent': 'Chrome'}
url = 'https://www.amazon.com/Automate-Boring-Stuff-Python-Programming-ebook/dp/B00WJ049VU/ref=tmm_kin_swatch_0?_encoding=UTF8&qid=&sr='
res = requests.get(url, headers=headers)
soup = bs4.BeautifulSoup(res.text, features='html.parser')
print(soup.select('#mediaNoAccordion > div.a-row > div.a-column.a-span4.a-text-right.a-span-last > span.a-size-medium.a-color-price.header-price'))

You need to use a more forgiving parser. You can also use a much shorter and more robust selector.

import requests
import bs4

headers = {'User-Agent': 'Chrome'}
url = 'https://www.amazon.com/Automate-Boring-Stuff-Python-Programming-ebook/dp/B00WJ049VU/ref=tmm_kin_swatch_0?_encoding=UTF8&qid=&sr='
res = requests.get(url, headers=headers)
soup = bs4.BeautifulSoup(res.text, features='lxml')
print(soup.select_one('.mediaTab_subtitle').text.strip())

For better uses you can do inspect element and click on the top-left corner on the arrow icon and activate it. Then you can hover over the element and select. After selecting you can choose from copying xpath / css selector / class / id

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM