简体   繁体   English

如何将自定义HTML添加到Mechanize页面对象

[英]How to add custom HTML to a Mechanize page object

I was wondering if it was possible to add custom HTML code to a Mechanize page object. 我想知道是否可以向Mechanize页面对象添加自定义HTML代码。 The goal is to avoid javascript code that creates a form, by adding the HTML generated by the javascript code to the mechanize page object (grabbed via the agent.get(uri) method) and having mechanize agent click on the submit form as if it really was there. 目标是通过将由javascript代码生成的HTML添加到机械化页面对象(通过agent.get(uri)方法进行抓取),并使机械化代理单击提交表单,就好像它避免生成表单的javascript代码一样。真的在那里。 IT should be possible as the form is created directly on the page without any call to the outside except for a jpeg. 可以直接在页面上创建表格,而无需使用jpeg进行外部调用,因此应该可以使用它。 I cannot use Selenium and similars, I need to stick with Mechanize and Nokogiri. 我不能使用Selenium和类似物,我需要坚持使用Mechanize和Nokogiri。 Any help or even direction would be appreciated! 任何帮助,甚至方向将不胜感激!

Look at " Scraping Data " in the Mechanize documentation. 查看Mechanize文档中的“ Scraping Data ”。 Because you can access and search the Nokogiri document you can modify it. 因为您可以访问和搜索Nokogiri文档,所以可以对其进行修改。

Modifying a document is easy with the Nokogiri used internally by Mechanize: 使用Mechanize内部使用的Nokogiri可以轻松地修改文档:

require 'mechanize'

agent = Mechanize.new
page = agent.get('http://www.example.org')
doc = page.parser
first_p = doc.at('p')
first_p.to_html                          # => "<p>This domain is established to be used for illustrative examples in documents. You may use this\n    domain in examples without prior coordination or asking for permission.</p>"
first_p.children = '
<form action="action_page.php">
First name:<br>
<input type="text" name="firstname" value="First name"><br>
Last name:<br>
<input type="text" name="lastname" value="Last name"><br><br>
<input type="submit" value="Submit">
</form>' 
first_p.to_html                          # => "<p>\n    <form action=\"action_page.php\">\n    First name:<br>\n    <input type=\"text\" name=\"firstname\" value=\"First name\"><br>\n    Last name:<br>\n    <input type=\"text\" name=\"lastname\" value=\"Last name\"><br><br>\n    <input type=\"submit\" value=\"Submit\">\n    </form></p>"

Looking up one level, to the parent: 在父级中向上一级查找:

page.parser.at('p').parent.to_html # => "<div>\n    <h1>Example Domain</h1>\n    <p>\n    <form action=\"action_page.php\">\n    First name:<br>\n    <input type=\"text\" name=\"firstname\" value=\"First name\"><br>\n    Last name:<br>\n    <input type=\"text\" name=\"lastname\" value=\"Last name\"><br><br>\n    <input type=\"submit\" value=\"Submit\">\n    </form></p>\n    <p><a href=\"http://www.iana.org/domains/example\">More information...</a></p>\n</div>"

Whether you can use Mechanize with the modified HTML is for you to figure out. 您是否可以将Mechanize与修改后的HTML结合使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM