简体   繁体   中英

Cann't reach new page after submit button click() in HTMLUnit

The problem is following: when I'm running this code, it runs until submitButton.fireEvent("onclick").getNewPage() , then it seems to end even if the last System.out.println(pageAfterLogin.getUrl().toString()) wasn't execute. No error occurred during execution of program.

import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlElement;
import com.gargoylesoftware.htmlunit.html.HtmlInput;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import java.util.List;

public class WebScraperHTMLUnit2 {

public static void main(String[] args) {
     try{
        WebClient wc = new WebClient();
        HtmlPage page = wc.getPage("https://www.google.com/");

        HtmlInput searchForm = (HtmlInput)page.getFirstByXPath("//input[@name='q']");
        searchForm.setValueAttribute("q");

        HtmlElement submitButton = page.getFirstByXPath("//button[@id='searchButton']");
        HtmlPage pageAfterLogin = (HtmlPage) submitButton.fireEvent("onclick").getNewPage();

        System.out.println(pageAfterLogin.getUrl().toString());   

    } catch (Exception ex) {}       
}    
}

Here is output log from NetBeans:

run:
дек 16, 2016 2:38:16 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'https://www.google.ru/' [1:14018] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <URI>, <FUNCTION>, "progid:".)
дек 16, 2016 2:38:16 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'https://www.google.ru/' [1:14042] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <URI>, <FUNCTION>, "progid:".)
дек 16, 2016 2:38:16 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
СБОРКА УСПЕШНО ЗАВЕРШЕНА (общее время: 3 секунды)

The xpath for the button is incorrect. The button is:

<input value="Google Search" aria-label="Google Search" name="btnK" type="submit" jsaction="sf.chk">

Your code should be something like:

 try {
     final WebClient wc = new WebClient();
     wc.getOptions().setThrowExceptionOnScriptError(false);

     HtmlPage page = wc.getPage("https://www.google.com/");

     HtmlInput searchForm = page.getFirstByXPath("//input[@name='q']");
     searchForm.setValueAttribute("q");

     HtmlSubmitInput submitButton = page.getFirstByXPath("//input[@name='btnK']");
    HtmlPage pageAfterLogin = submitButton.click();

    System.out.println(pageAfterLogin.getUrl().toString());   

} catch (Exception e) {}

The reason you need to add setThrowExceptionOnScriptError to false is because an error is thrown (for unknown reasons) and you don't want to stop your code from executing because of it.

According to this post the generated HTML on www.google.com keeps changing. So my //input[@name='btnK'] xpath might not work in the future.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM