简体   繁体   中英

How to add custom HTML to a Mechanize page object

I was wondering if it was possible to add custom HTML code to a Mechanize page object. The goal is to avoid javascript code that creates a form, by adding the HTML generated by the javascript code to the mechanize page object (grabbed via the agent.get(uri) method) and having mechanize agent click on the submit form as if it really was there. IT should be possible as the form is created directly on the page without any call to the outside except for a jpeg. I cannot use Selenium and similars, I need to stick with Mechanize and Nokogiri. Any help or even direction would be appreciated!

Look at " Scraping Data " in the Mechanize documentation. Because you can access and search the Nokogiri document you can modify it.

Modifying a document is easy with the Nokogiri used internally by Mechanize:

require 'mechanize'

agent = Mechanize.new
page = agent.get('http://www.example.org')
doc = page.parser
first_p = doc.at('p')
first_p.to_html                          # => "<p>This domain is established to be used for illustrative examples in documents. You may use this\n    domain in examples without prior coordination or asking for permission.</p>"
first_p.children = '
<form action="action_page.php">
First name:<br>
<input type="text" name="firstname" value="First name"><br>
Last name:<br>
<input type="text" name="lastname" value="Last name"><br><br>
<input type="submit" value="Submit">
</form>' 
first_p.to_html                          # => "<p>\n    <form action=\"action_page.php\">\n    First name:<br>\n    <input type=\"text\" name=\"firstname\" value=\"First name\"><br>\n    Last name:<br>\n    <input type=\"text\" name=\"lastname\" value=\"Last name\"><br><br>\n    <input type=\"submit\" value=\"Submit\">\n    </form></p>"

Looking up one level, to the parent:

page.parser.at('p').parent.to_html # => "<div>\n    <h1>Example Domain</h1>\n    <p>\n    <form action=\"action_page.php\">\n    First name:<br>\n    <input type=\"text\" name=\"firstname\" value=\"First name\"><br>\n    Last name:<br>\n    <input type=\"text\" name=\"lastname\" value=\"Last name\"><br><br>\n    <input type=\"submit\" value=\"Submit\">\n    </form></p>\n    <p><a href=\"http://www.iana.org/domains/example\">More information...</a></p>\n</div>"

Whether you can use Mechanize with the modified HTML is for you to figure out.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM