简体   繁体   中英

scraping web page using htmlunit

Here is my code. I want to get access to a website and compare my data. I want java to put data in fields and auto click on calculate bottom and return the answer to java.

import com.gargoylesoftware.htmlunit.WebClient;

public class MyWebServiceAccess {

  public static void main(String[] args) throws Exception 
  {

        final WebClient webClient = new WebClient();

        final HtmlPage page = webClient.getPage("https://www.socscistatistics.com/tests/signedranks/Default2.aspx");

        // Inputs
        HtmlTextInput treatment1 = (HtmlTextInput) page.getElementById("ctl00_MainContent_TextBox1");
        HtmlTextInput treatment2 = (HtmlTextInput) page.getElementById("ctl00_MainContent_TextBox2");

        // Significance Level:
        HtmlRadioButtonInput s1= (HtmlRadioButtonInput) page.getElementById("ctl00_MainContent_RadioButtonList1_0");
        HtmlRadioButtonInput s2= (HtmlRadioButtonInput) page.getElementById("ctl00_MainContent_RadioButtonList1_1");


        // 1 or 2-tailed hypothesis?:
        HtmlRadioButtonInput t1= (HtmlRadioButtonInput) page.getElementById("ctl00_MainContent_RadioButtonList2_0");
        HtmlRadioButtonInput t2= (HtmlRadioButtonInput) page.getElementById("ctl00_MainContent_RadioButtonList2_1");

        // Calculate
        HtmlSubmitInput Calculate= (HtmlSubmitInput) page.getElementById("ctl00_MainContent_Button2");

        // Result Span
        HtmlSpan result = (HtmlSpan) page.getElementById("ctl00_MainContent_Label9");

        // Fill in Inputs 
        treatment1.setValueAttribute("");
        treatment2.setValueAttribute("");

        s1.setChecked(true);
        s2.setChecked(false);

        t1.setChecked(true);
        t2.setChecked(false);

        Calculate.click();

        // Printing the Output
        System.out.println(result.asText());

        webClient.closeAllWindows();

   }

}

When scrapping web pages you need a basic understanding how the web technologies (http/html) are working and you also need some java/programming knowledge. At least it is really helpful to be able to find problems in your programs.

At first your code produces a class cast exception because the input fields are text areas and not input text controls. At second you have to click the right button (your code clicks the 'reset' button). And finally you got a new page if the button is clicked. Your result is on the new page.

Hope that helps....

String url = "https://www.socscistatistics.com/tests/signedranks/Default2.aspx";                                   

try (final WebClient webClient = new WebClient()) {                                       
    HtmlPage page = webClient.getPage(url);                                                                        

    // Inputs                                                                                                      
    HtmlTextArea treatment1 = (HtmlTextArea) page.getElementById("ctl00_MainContent_TextBox1");                    
    HtmlTextArea treatment2 = (HtmlTextArea) page.getElementById("ctl00_MainContent_TextBox2");                    

    // Significance Level:                                                                                         
    HtmlRadioButtonInput s1= (HtmlRadioButtonInput) page.getElementById("ctl00_MainContent_RadioButtonList1_0");   
    HtmlRadioButtonInput s2= (HtmlRadioButtonInput) page.getElementById("ctl00_MainContent_RadioButtonList1_1");   
    s1.setChecked(true);                                                                                           
    s2.setChecked(false);                                                                                          

    // 1 or 2-tailed hypothesis?:                                                                                  
    HtmlRadioButtonInput t1= (HtmlRadioButtonInput) page.getElementById("ctl00_MainContent_RadioButtonList2_0");   
    HtmlRadioButtonInput t2= (HtmlRadioButtonInput) page.getElementById("ctl00_MainContent_RadioButtonList2_1");   
    t1.setChecked(true);                                                                                           
    t2.setChecked(false);                                                                                          

    // Fill in Inputs                                                                                              
    treatment1.type("4\n3\n2\n5\n5\n3");                                                                           
    treatment2.type("1\n2\n3\n0\n0\n2");                                                                           

    // click Calculate creates a new page                                                                          
    HtmlSubmitInput calculate= (HtmlSubmitInput) page.getElementById("ctl00_MainContent_Button1");                 
    page = calculate.click();                                                                                      

    // Result Span                                                                                                 
    HtmlSpan result = (HtmlSpan) page.getElementById("ctl00_MainContent_Label9");                                  

    // Printing the Output                                                                                         
    System.out.println(result.asText());                                                                           
}                                                                                                                  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM