Web scraping with BeautifulSoup .find() always returns None

Question

Relevant part of the DOM: Screenshot of the DOM

This is the code I wrote:

from bs4 import BeautifulSoup
import requests

URL = 'https://www.cheapflights.com.sg/flight-search/SIN-KUL/2022-06-04?sort=bestflight_a&attempt=3&lastms=1653844067064'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
flight = soup.find('div', class_= 'resultWrapper')
print(flight)

The result that I get whenever print(flight) is executed is always None. I have tried changing to div tags with different class names but it still always returns None. The soup seems to be fine though because when I execute print(soup) it returns a text version of the DOM so the problem seems to be with the next line

Any suggestions on how I can get something other than None? Thank you!

Answer 1

That's because of the User-Agent. If I try to curl the page without changing the default User-Agent, it'll return this page.

Change your code like this, to avoid that your program gets detected:

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) ..."
}
page = requests.get(URL, headers=headers)

Web scraping with BeautifulSoup .find() always returns None

Question

1 answers

solution1
0 2022-05-29 17:48:13

Web scraping with BeautifulSoup .find() always returns None

Question

1 answers

solution1 0 2022-05-29 17:48:13

solution1
0 2022-05-29 17:48:13