简体   繁体   中英

How to find a particular class in BeautifulSoup using findAll

FindAll doesn't find the class I need. However I was able to find the class above that one, but the data structure is not that well organized.

  • Do you know what can we do to get the data or organize the output from the class above which has all the data together ?

Please see the HTML below and the images.

import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.vivino.com/explore?e=eJzLLbI11jNVy83MszU0UMtNrLA1MVBLrrQtLVYrsDVUK7ZNTlQrS7YtKSpNVSsviY4FioEpIwhlDKFMIJQ5VM4EAJCfGxQ='

#Opening a connection
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

#html parse
page_soup = soup(page_html, "html.parser")
container = page_soup.findAll("div", {"class":"wine-explorer__results__item"})
len(container)

在此处输入图片说明

在此处输入图片说明

You can use Dryscrape module with bs4 because wine-explorer selector is created by javascript. Dryscrape module helps you for javascript support.

尝试改用以下内容:

container = page_soup.findAll("div", {"class": "wine-explorer__results"})

Thanks everyone, as you all suggested a module to read Javascript was needed to select that class. I've used selenium in this case, however PyQt5 might be a better option.

import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
from selenium import webdriver

my_url = 'https://www.vivino.com/explore?e=eJzLLbI11jNVy83MszU0UMtNrLA1MVBLrrQtLVYrsDVUK7ZNTlQrS7YtKSpNVSsviY4FioEpIwhlDKFMIJQ5VM4EAJCfGxQ='

#Opening a connection

#html parse

web_r = uReq(my_url)

driver=webdriver.Firefox()
driver.get(my_url)
page_soup = soup(web_r, "html.parser")
html = driver.execute_script("return document.documentElement.outerHTML")

#print(html)
html_page_soup = soup(html, "html.parser")
container = html_page_soup.findAll("div", {"class": "wine-explorer__results__item"})
len(container)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM