Can't View Complete Page Source in Selenium

Question

When I view the source HTML after manually navigating to the site via Chrome I can see the full page source but on loading the page source via selenium I'm not getting the complete page source.

from bs4 import BeautifulSoup
from selenium import webdriver
import sys,time


driver = webdriver.Chrome(executable_path=r"C:\Python27\Scripts\chromedriver.exe")
driver.get('http://www.magicbricks.com/')


driver.find_element_by_id("buyTab").click()

time.sleep(5)
driver.find_element_by_id("keyword").send_keys("Navi Mumbai")

time.sleep(5)
driver.find_element_by_id("btnPropertySearch").click()

time.sleep(30)

content = driver.page_source.encode('utf-8').strip()

soup = BeautifulSoup(content,"lxml")

print soup.prettify()

Answer 1

The website is possibly blocking or restricting the user agent for selenium. An easy test is to change the user agent and see if that does it. More info at this question:

Change user agent for selenium driver

Quoting:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
opts = Options()
opts.add_argument("user-agent=whatever you want")

driver = webdriver.Chrome(chrome_options=opts)

Answer 2

Try something like:

import time
time.sleep(5)
content = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")

instead of driver.page_source .

Dynamic web pages are often needed to be rendered by JavaScript.

Can't View Complete Page Source in Selenium

Question

2 answers

solution1
0 2016-08-19 20:45:03

solution2
0 2020-08-23 15:55:10

Can't View Complete Page Source in Selenium

Question

2 answers

solution1 0 2016-08-19 20:45:03

solution2 0 2020-08-23 15:55:10

solution1
0 2016-08-19 20:45:03

solution2
0 2020-08-23 15:55:10