简体   繁体   English

HtmlUnit webscraping Anchor 标签,带有带有 JavaScript 的下拉链接

[英]HtmlUnit webscraping Anchor tag with dropdown link that has JavaScript

Is it possible to click on a link using HtmlUnit when that link has a dropdown list of links when you mouseover the initial link.当您将鼠标悬停在初始链接上时,当该链接具有链接下拉列表时,是否可以使用 HtmlUnit 单击该链接。 If you click the initial link nothing happens except for you get list of links that drop down when you mouse over.如果您单击初始链接,除了鼠标悬停时会下拉的链接列表之外,什么都不会发生。 I would like to click one of the drop down links and grab the web page that is associated with that link.我想单击下拉链接之一并抓取与该链接关联的网页。

The problem seems to be that the Anchor has JavaScript and also it is a drop down list.问题似乎是 Anchor 有 JavaScript,而且它是一个下拉列表。 If the Anchor did not have JavaScript and drop down then I would not have any problems.如果 Anchor 没有 JavaScript 并下拉,那么我不会有任何问题。

Here is the pertinent JavaScript Code:这是相关的 JavaScript 代码:

<script language='JavaScript' type='text/javascript'>
<!--
function mmLoadMenus(){
 window.mm_menu_0805151542_0 = new Menu("root",211,23,"Arial, Helvetica, sans-serif",11,"#FFFFFF","#FFFFFF","#056CB9","#014D98","left","middle",3,0,1000,-5,7,true,false,true,2,true,false);
  mm_menu_0805151542_0.addMenuItem("View&nbsp;Tax&nbsp;Sales","window.open('TCTaxSaleBrief.asp', '_blank','width=800,height=580,scrollbars=1,resizable=yes,top=50,left=100');");
  mm_menu_0805151542_0.addMenuItem("Registration&nbsp;Renewal&nbsp;Reprint","window.open('vrRenewal.asp', '_blank','width=800,height=580,scrollbars=1,resizable=yes,top=50,left=100');");
  mm_menu_0805151542_0.addMenuItem("Drivers&nbsp;License","window.open('http://www.dds.ga.gov/', '_blank');");
  mm_menu_0805151542_0.addMenuItem("Online&nbsp;Tag&nbsp;Renewals","location='../TaxCommissioner/TagRenewal.html'");
   mm_menu_0805151542_0.hideOnMouseOut=true;
   mm_menu_0805151542_0.bgColor='#CCCCCC';
   mm_menu_0805151542_0.menuBorder=0;
   mm_menu_0805151542_0.menuLiteBgColor='#FFFFFF';
   mm_menu_0805151542_0.menuBorderBgColor='#015BA7';

</script>

Here is the pertinent Anchor:这是相关的锚点:

<a href="#" name="link11" class="nav" id="link10" onmouseover="MM_showMenu(window.mm_menu_0805151542_0,104,0,null,'link11')" onmouseout="MM_startTimeout();">Online Services</a><br />

Here is the snippet of Java Code that I am using to make this work.这是我用来完成这项工作的 Java 代码片段。

WebClient webClient = new WebClient(BrowserVersion.FIREFOX_10);
    String webPage="http://website.html";
    try {
        HtmlPage taxComPage = webClient.getPage(webPage);
        HtmlElement htmlElement = taxComPage.getDocumentElement();
            //HtmlAnchor anchor = taxComPage.getAnchorByText("View Tax Sales");
        //HtmlAnchor htmlAnchor = taxComPage.getHtmlElementById("link10");
        HtmlAnchor anchor = taxComPage.getAnchorByText("Online Services"); 

        HtmlPage page = anchor.click();
    }catch

If it is the case that HtmlUnit does not work with JavaScript please let me know!如果 HtmlUnit 不适用于 JavaScript,请告诉我!

Thanks谢谢

I understand that there is this function called: mmLoadMenus() which has text that is displayed when moused over but I having issue with how is this function associated with the anchor.我知道有一个名为:mmLoadMenus() 的函数,它具有鼠标悬停时显示的文本,但我对此函数与锚点的关联有疑问。 In the anchor there is something called MM_showMenu.在锚点中有一个叫做 MM_showMenu 的东西。 What is this MM_showMenu, who created it, is this a JavaScript keyword, I don't see it being defined anywhere.创建它的 MM_showMenu 是什么,这是一个 JavaScript 关键字,我没有看到它在任何地方被定义。 I have searched the whole page, the only place it is mentioned is in the anchor.我已经搜索了整个页面,唯一提到的地方是在锚点中。 It seems to be some type of a function with parameters of: window.mm_menu_0805151542_0,104,0,null,'link11' being passed to it.它似乎是某种类型的函数,其参数为:window.mm_menu_0805151542_0,104,0,null,'link11' 被传递给它。 The only connection that I can make between function mmLoadMenus() and the anchor is that the anchor has mm_menu_0805151542_0 in it.我可以在函数 mmLoadMenus() 和锚点之间建立的唯一联系是锚点中包含 mm_menu_0805151542_0。 I am not that well versed in JavaScript maybe that is why I am not making a strong connection with the JavaScript function and the anchor.我不太精通 JavaScript 也许这就是为什么我没有与 JavaScript 函数和锚点建立牢固的联系。

The data is already on the page so why not scrape it from the JavaScript function itself.数据已经在页面上,所以为什么不从 JavaScript 函数本身抓取它。 Just a matter of parsing out the text - much easier then trying to force it to load.只是解析文本的问题 - 比试图强制加载要容易得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM