简体   繁体   中英

BeautifulSoup4 can't find sections

I was trying to learn something about parsing HTML webpages by using python and when I needed to extract some links from website's sections I realized that my "soup" object that should contain all sections with specified class is empty. Is this normal or am I doing something wrong?

Code:

from bs4 import BeautifulSoup
import requests

URL = "https://auto.ria.com/newauto/marka-jeep/"

HEADERS = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36",
            "accept":"*/*"}

def get_html(url, params=None):
    req = requests.get(url, headers=HEADERS, params=params)
    return req

def get_content(html):
    soup = BeautifulSoup(html, "html.parser")
    section_items = soup.find_all("section", class_="proposition ")
    cars = []
    for item in section_items:
        cars.append({
            "title": item.find("div",class_="proposition_title").get_text(strip=True)
            "link": item.find("a", class_="proposition_link").get("href")
            })
    print(cars)

def parse():
    html = get_html(URL)
    if html.status_code == 200:
        get_content(html.text)
    else:
        print("Something went wrong")

parse()

Using urllib instead of requests solved the problem for me:)

import urllib.request

html = urllib.request.urlopen(url)
soup = BeautifulSoup(html, 'html.parser')
print(soup)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM