简体   繁体   中英

Unable to scrape WebPage using class Implementation of selenium

I am using selenium to scrape a web page, dynamically generated by javascript. It works fine when I make call from cmd(python) terminal directly. But does't work fine when I implemented this functionality in class.

My class Implementation is:

    class web_scraper():
        def __init__(self):
            # start chrome driver 
            self.driver = webdriver.Chrome(executable_path="./config/chromedriver.exe")
        
       # scrape web page from specified url
        def scrape_page(self, url):
            html = None
            try:
                # scrape page
                self.driver.get(url)
                
                # read html 
                html = self.driver.execute_script("return document.documentElement.innerHTML;")
            except Exception as e:
                print('[Error:] Scrapping failed.')
                print(f'[Exception:] {e}')
    
            return html
     if __name__ == '__main__':
         url = "https://wipp.edmundsassoc.com/Wipp/?wippid=1205#taxPage9"
         scraper = web_scraper()
         content = scraper.scrape_page(url)

Code, which I used at terminal is:

driver = webdriver.Chrome(executable_path='E:/Projects/Python_Projects/WebScraping/config/chromedriver.exe')
driver.get("https://wipp.edmundsassoc.com/Wipp/?wippid=1205#taxPage30")
content = driver.execute_script("return document.documentElement.innerHTML;")

Output of class implementation is:

<head>
    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <link type="text/css" rel="stylesheet" href="Wipp.css">
    <title>WIPP</title>
  <link rel="stylesheet" href="https://wipp.edmundsassoc.com/Wipp/wipp/gwt/standard/standard.css"><script src="https://wipp.edmundsassoc.com/Wipp/wipp/0D3421F8F9508D2F958C63CE2A48BAD8.cache.js"></script></head>

  <body>
    <script type="text/javascript" language="javascript" src="wipp/wipp.nocache.js"></script>
    <iframe src="javascript:''" id="__gwt_historyFrame" tabindex="-1" style="position:absolute;width:0;height:0;border:0"></iframe>


</body>

While in case of commands on python terminal the output is fine.

Any help regarding this would be appreciable. Thanks!

I am using Windows OS and Python version is 3.6.

Add time.sleep() after getting url

self.driver.get(url)
time.sleep(10)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM