I was trying to learn something about parsing HTML webpages by using python and when I needed to extract some links from website's sections I realized that my "soup" object that should contain all sections with specified class is empty. Is this normal or am I doing something wrong?
Code:
from bs4 import BeautifulSoup
import requests
URL = "https://auto.ria.com/newauto/marka-jeep/"
HEADERS = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36",
"accept":"*/*"}
def get_html(url, params=None):
req = requests.get(url, headers=HEADERS, params=params)
return req
def get_content(html):
soup = BeautifulSoup(html, "html.parser")
section_items = soup.find_all("section", class_="proposition ")
cars = []
for item in section_items:
cars.append({
"title": item.find("div",class_="proposition_title").get_text(strip=True)
"link": item.find("a", class_="proposition_link").get("href")
})
print(cars)
def parse():
html = get_html(URL)
if html.status_code == 200:
get_content(html.text)
else:
print("Something went wrong")
parse()
Using urllib instead of requests solved the problem for me:)
import urllib.request
html = urllib.request.urlopen(url)
soup = BeautifulSoup(html, 'html.parser')
print(soup)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.