[英]Chrome/Firefox web browser automation for collect data
I would like to browse automatically in a website to collect some data. 我想在网站上自动浏览以收集一些数据。
There's a page with a form. 有一个带有表单的页面。 The form consists of a select and a submit button.
该表单包含一个选择和一个提交按钮。 Selecting an option of the select and clicking on the submit button leads to another page where there's some tables with related data.
选择选择中的一个选项并单击提交按钮,将转到另一页,其中有一些包含相关数据的表。
I need to collect and save in file this data for each option. 我需要收集每个选项的数据并将其保存在文件中。 Probably I will need to go back to the first page to repeat the task for each option.
可能我需要回到第一页才能为每个选项重复执行该任务。 The detail is that I don't know the exactly number of options previously.
详细信息是我之前不知道确切的选项数量。
My idea is to do that task, preferably, with Firefox or Chrome. 我的想法是最好使用Firefox或Chrome来完成该任务。 I think that the only way to do that is via programming.
我认为唯一的方法就是通过编程。
Someone could indicate me a way to do that task in a easy and fast way. 有人可以告诉我一种简便快捷的方法来完成该任务。 I know a little bit about Java, Javascript and Python.
我对Java,Javascript和Python有所了解。
You might want to google "web browser automation" tool like Selenium. 您可能想使用Google的Selenium之类的“网络浏览器自动化”工具。 Although not entirely fit for the purpose I think it can be used to implement your requirement.
尽管不完全适合此目的,但我认为它可以用于实现您的要求。
Since the task is relatively well constrained, I would avoid Selenium (it's a little brittle), and instead try this approach: 由于任务相对受限,因此我将避免使用Selenium(这有点脆弱),而是尝试以下方法:
I found a solution to my problem. 我找到了解决问题的方法。 It's called HtmlUnit:
它称为HtmlUnit:
http://htmlunit.sourceforge.net/gettingStarted.html http://htmlunit.sourceforge.net/gettingStarted.html
HtmlUnit is a "GUI-Less browser for Java programs". HtmlUnit是“用于Java程序的无GUI浏览器”。
It allows to web browsing and data collecting using Java and it's very simple and easy to use. 它允许使用Java进行Web浏览和数据收集,并且非常简单易用。
Not exactly what I asked, but it's better. 并不是我问的那样,但是更好。 At least to me.
至少对我来说。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.