簡體   English   中英

HTMLUnit不適用於Ajax / Javascript

[英]HTMLUnit not working with Ajax/Javascript

我正在嘗試從網頁(顯示搜索結果的頁面)中提取課程項目的數據。 具體來說,是此頁面:

http://www.target.com/c/xbox-one-games-video/-/N-55krw#navigation=true&category=55krw&searchTerm=&view_type=medium&sort_by=bestselling&faceted_value=&offset=60&pageCount=60&response_group=Items&isLeaf=true&parent_parent false&min_price = from&max_price = to

我只想提取產品的標題。

我正在使用以下代碼:

final WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
final HtmlPage page = webClient.getPage(itemPageURL);
int tries = 20;  // Amount of tries to avoid infinite loop
while (tries > 0) {
    tries--;
    synchronized(page) {
       page.wait(2000);  // How often to check
    }
}
int numThreads = webClient.waitForBackgroundJavaScript(1000000l);

PrintWriter pw = new PrintWriter("test-target-search.txt");
pw.println(page.asXml());
pw.close();

結果頁面沒有Web瀏覽器上顯示的產品信息。 我以為AJAX調用還沒有完成? (雖然不確定。)

任何幫助將不勝感激。 謝謝!

您可以將GET請求用於此類任務。 通過URL中的“ pageCount”和“ offset”參數控制頁面,在檢索頁面(下面的示例對一個頁面執行此操作)之后,您可以使用正則表達式或(JSON?)中的任何內容來提取標題。

public static void main(String[] args)
{
    try
    {
        WebClient webClient = new WebClient();

        URL url = new URL(
                "http://tws.target.com/searchservice/item/search_results/v1/by_keyword?callback=getPlpResponse&navigation=true&category=55krw&searchTerm=&view_type=medium&sort_by=bestselling&faceted_value=&offset=60&pageCount=60&response_group=Items&isLeaf=true&parent_category_id=55kug&custom_price=false&min_price=from&max_price=to");
        WebRequest requestSettings = new WebRequest(url, HttpMethod.GET);

        requestSettings.setAdditionalHeader("Accept", "*/*");
        requestSettings.setAdditionalHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8");
        requestSettings.setAdditionalHeader("Referer", "http://www.target.com/c/xbox-one-games-video/-/N-55krw");
        requestSettings.setAdditionalHeader("Accept-Language", "en-US,en;q=0.8");
        requestSettings.setAdditionalHeader("Accept-Encoding", "gzip,deflate,sdch");
        requestSettings.setAdditionalHeader("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.3");

        Page page = webClient.getPage(requestSettings);

        System.out.println(page.getWebResponse().getContentAsString());
    }
    catch (Exception e)
    {
        e.printStackTrace();
    }
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM