BeautifulSoup is not parsing html correctly on terminal, but works in my Jupyter Notebook

Question

I'm currently learning basic web scraping using python and beautiful soup. I did some stuff in my Jupyter Notebook and it worked, but when I run the same code from a .py file in my terminal, BeautifulSoup does not seem to be parsing correctly, and nothing get printed out.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import pandas as pd

driver = webdriver.Chrome(executable_path="/Users/Shiva/Downloads/chromedriver")

driver.get('https://www.google.com/flights?hl=en#flt=/m/03v_5.IAD.2019-02-10*IAD./m/03v_5.2019-02-11;c:USD;e:1;sd:1;t:f')

load_all_flights = driver.find_element_by_xpath('//*[@id="flt-app"]/div[2]/main[4]/div[7]/div[1]/div[3]/div[4]/div[5]/div[1]/div[3]/jsl/a[1]/span[1]/span[2]')

load_all_flights.click()

soup = BeautifulSoup(driver.page_source, 'html.parser')

info = soup.find_all('div', class_="gws-flights-results__collapsed-itinerary gws-flights-results__itinerary")

for trip in info:
    price = trip.find('div', class_="flt-subhead1 gws-flights-results__price gws-flights-results__cheapest-price")
    if price == None:
        price = trip.find('div', class_="flt-subhead1 gws-flights-results__price")
    type_of_flight = trip.find('div', class_="gws-flights-results__stops flt-subhead1Normal gws-flights-results__has-warning-icon")
    if type_of_flight == None:
        type_of_flight = trip.find('div', class_="gws-flights-results__stops flt-subhead1Normal")
    print(str(type_of_flight.text).strip()  + " : " + str(price.text).strip())

In jupyter note book, I get a list of flight types and prices "nonstop: $500"

but it doesn't work in terminal as the "info" variable is an empty list

Answer 1

You need to wait for the page to render. The reason Jupyter gets data is that it is slow enough (or you have different cells) to render the page before you parse the page. The following should do the trick:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup
import pandas as pd

driver = webdriver.Chrome(executable_path="C:\\Users\\Andersen\\Desktop\\Tools\\chromedriver.exe")

driver.get('https://www.google.com/flights?hl=en#flt=/m/03v_5.IAD.2019-02-10*IAD./m/03v_5.2019-02-11;c:USD;e:1;sd:1;t:f')

xpath = '//*[@id="flt-app"]/div[2]/main[4]/div[7]/div[1]/div[3]/div[4]/div[5]/div[1]/div[3]/jsl/a[1]/span[1]/span[2]'

wait = WebDriverWait(driver, 10)
confirm = wait.until(EC.element_to_be_clickable((By.XPATH, xpath)))

load_all_flights = driver.find_element_by_xpath(xpath)

load_all_flights.click()

soup = BeautifulSoup(driver.page_source, 'html.parser')

info = soup.find_all('div', class_="gws-flights-results__collapsed-itinerary gws-flights-results__itinerary")

for trip in info:
    price = trip.find('div', class_="flt-subhead1 gws-flights-results__price gws-flights-results__cheapest-price")
    if price == None:
        price = trip.find('div', class_="flt-subhead1 gws-flights-results__price")
    type_of_flight = trip.find('div', class_="gws-flights-results__stops flt-subhead1Normal gws-flights-results__has-warning-icon")
    if type_of_flight == None:
        type_of_flight = trip.find('div', class_="gws-flights-results__stops flt-subhead1Normal")
    print(str(type_of_flight.text).strip()  + " : " + str(price.text).strip())

Output (as of 2019-02-02):

2 stops : $588
Nonstop : $749
Nonstop : $749
1 stop : $866
2 stops : $1,271
2 stops : $1,294
2 stops : $1,294
2 stops : $1,805
2 stops : $1,805

BeautifulSoup is not parsing html correctly on terminal, but works in my Jupyter Notebook

Question

1 answers

solution1
0 ACCPTED 2019-02-02 19:26:34

BeautifulSoup is not parsing html correctly on terminal, but works in my Jupyter Notebook

Question

1 answers

solution1 0 ACCPTED 2019-02-02 19:26:34

solution1
0 ACCPTED 2019-02-02 19:26:34