简体   繁体   中英

Using Selenium in Python to save a webpage on Firefox

I am trying to use Selenium in Python to save webpages on MacOS Firefox .

So far, I have managed to click COMMAND + S to pop up the SAVE AS window . However,

I don't know how to:

  1. change the directory of the file,
  2. change the name of the file, and
  3. click the SAVE AS button.

Could someone help?

Below is the code I have use to click COMMAND + S :

ActionChains(browser).key_down(Keys.COMMAND).send_keys("s").key_up(Keys.COMMAND).perform()

Besides, the reason for me to use this method is that I encounter Unicode Encode Error when I :-

  1. write the page_source to a html file and
  2. store scrapped information to a csv file.

Write to a html file:

file_object = open(completeName, "w")
html = browser.page_source
file_object.write(html)
file_object.close() 

Write to a csv file:

csv_file_write.writerow(to_write)

Error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\\xf8' in position 1: ordinal not in range(128)

What you are trying to achieve is impossible to do with Selenium. The dialog that opens is not something Selenium can interact with.

The closes thing you could do is collect the page_source which gives you the entire HTML of a single page and save this to a file.

import codecs

completeName = os.path.join(save_path, file_name)
file_object = codecs.open(completeName, "w", "utf-8")
html = browser.page_source
file_object.write(html)

If you really need to save the entire website you should look into using a tool like AutoIT. This will make it possible to interact with the save dialog.

with open('page.html', 'w') as f:
    f.write(driver.page_source)

You cannot interact with system dialogs like save file dialog. If you want to save the page html you can do something like this:

page = driver.page_source
file_ = open('page.html', 'w')
file_.write(page)
file_.close()

This is a complete, working example of the answer RemcoW provided:

You first have to install a webdriver, eg pip install selenium chromedriver_installer .

#!/usr/bin/env python
# -*- coding: utf-8 -*-

# core modules
import codecs
import os

# 3rd party modules
from selenium import webdriver


def get_browser():
    """Get the browser (a "driver")."""
    # find the path with 'which chromedriver'
    path_to_chromedriver = ('/usr/local/bin/chromedriver')
    browser = webdriver.Chrome(executable_path=path_to_chromedriver)
    return browser


save_path = os.path.expanduser('~')
file_name = 'index.html'
browser = get_browser()

url = "https://martin-thoma.com/"
browser.get(url)

complete_name = os.path.join(save_path, file_name)
file_object = codecs.open(complete_name, "w", "utf-8")
html = browser.page_source
file_object.write(html)
browser.close()

you can achieve this with pyautogui library but if you have to save multiple pages in a loop, you can't execute any other tasks on the screen.

import pyautogui
import time 
pyautogui.hotkey('ctrl', 's')
time.sleep(1)   
pyautogui.typewrite("file name")
time.sleep(1)
pyautogui.hotkey('enter')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM