简体   繁体   中英

How to set the page_source of selenium yourself in python3?

I am writing an application using selenium. I know that you can use the webdriver.Firefox 's get method to get the webpage like this:

    driver = webdriver.Firefox(executable_path=r'geckodriver')
    driver.get('file://' + os.path.dirname(os.path.abspath(__file__)) + '/index.html')
    driver.page_source # get the source

But instead of opening a webpage and and getting source from there, I want to provide the source myself like this:

    driver.page_source = '<body><h1>Hello</h1></body>'

And then be able to perform the normal selenium operations, for example:

    driver.find_element_by_tag_name('<h1>')

But since Firefox.page_source is a @property i can't set it manually. Does anyone know a work around that? Any suggestions will be highly appreciated.

you can open it with Data URLs, it prefixed with the data: scheme

htmlString = '<body><h1>Hello</h1></body>'
driver.get("data:text/html;charset=utf-8," + htmlString);
h1 = driver.find_element_by_tag_name('h1')
print(h1.text)

Length limitations : 65535 characters

or without length limitation you can append the string using javascript method execute_script()

htmlString = '<html><body></body></html>'
driver.get("data:text/html;charset=utf-8," + htmlString);
largeHTMLString = '<h1>Hello</h1>'
driver.execute_script('document.body.innerHTML=arguments[0]', largeHTMLString)
h1 = driver.find_element_by_tag_name('h1')
print(h1.text)

If you don't mind parsing with beautiful soup, this is how I would handle that problem:

from bs4 import BeautifulSoup

# Define the code
page_source = '<body><h1>Hello</h1></body>'

# Parse it using Beautiful Soup
soup = BeautifulSoup(page_source , 'lxml')

# Search for the result by the tag name
table = soup.findAll('name')

Hope that helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM