简体   繁体   中英

Can't View Complete Page Source in Selenium

When I view the source HTML after manually navigating to the site via Chrome I can see the full page source but on loading the page source via selenium I'm not getting the complete page source.

from bs4 import BeautifulSoup
from selenium import webdriver
import sys,time


driver = webdriver.Chrome(executable_path=r"C:\Python27\Scripts\chromedriver.exe")
driver.get('http://www.magicbricks.com/')


driver.find_element_by_id("buyTab").click()

time.sleep(5)
driver.find_element_by_id("keyword").send_keys("Navi Mumbai")

time.sleep(5)
driver.find_element_by_id("btnPropertySearch").click()

time.sleep(30)

content = driver.page_source.encode('utf-8').strip()

soup = BeautifulSoup(content,"lxml")

print soup.prettify()

The website is possibly blocking or restricting the user agent for selenium. An easy test is to change the user agent and see if that does it. More info at this question:

Change user agent for selenium driver

Quoting:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
opts = Options()
opts.add_argument("user-agent=whatever you want")

driver = webdriver.Chrome(chrome_options=opts)

Try something like:

import time
time.sleep(5)
content = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")

instead of driver.page_source .

Dynamic web pages are often needed to be rendered by JavaScript.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM