简体   繁体   中英

Web scraping with BeautifulSoup .find() always returns None

Relevant part of the DOM: Screenshot of the DOM

This is the code I wrote:

from bs4 import BeautifulSoup
import requests

URL = 'https://www.cheapflights.com.sg/flight-search/SIN-KUL/2022-06-04?sort=bestflight_a&attempt=3&lastms=1653844067064'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
flight = soup.find('div', class_= 'resultWrapper')
print(flight)

The result that I get whenever print(flight) is executed is always None. I have tried changing to div tags with different class names but it still always returns None. The soup seems to be fine though because when I execute print(soup) it returns a text version of the DOM so the problem seems to be with the next line

Any suggestions on how I can get something other than None? Thank you!

That's because of the User-Agent. If I try to curl the page without changing the default User-Agent, it'll return this page.

Change your code like this, to avoid that your program gets detected:

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) ..."
}
page = requests.get(URL, headers=headers)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM