[英]How to extract data from ajax/javascript websites using htmlunit? I m trying to extract shipment history
I m trying to extract shipment history from this page http://www.aramex.com/express/track-results.aspx?q=aWQ9MzU2NDQ4MTQ3Jg%3d%3d-ULINyZQtKrw%3d . 我正在尝试从此页面http://www.aramex.com/express/track-results.aspx?q=aWQ9MzU2NDQ4MTQ3Jg%3d%3d-ULINyZQtKrw%3d中提取装运历史记录。
This my code: 这是我的代码:
public void aramexTracking() {
WebClient webClient = new WebClient(BrowserVersion.CHROME);
String trackingId = "9181468833";
HtmlPage page1, page2;
try {
page1 = webClient.getPage("http://www.aramex.com/express/track.aspx");
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setPrintContentOnFailingStatusCode(false);
webClient.setCssErrorHandler(new com.gargoylesoftware.htmlunit.SilentCssErrorHandler());
//Submitting form on Tracking Page
HtmlForm form = page1.getFormByName("aspnetForm");
HtmlButtonInput button = form.getInputByName("ctl00$ctl00$MainContent$InnerMainContent$btnGo");
HtmlTextArea textArea = form.getTextAreaByName("ShipmentNumber");
textArea.setText(trackingId);
page2 = button.click();
List<?> list = page2.getByXPath("//div[@id='dvSearchResults']/text()");
} catch (FailingHttpStatusCodeException | IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Please, post a valid tracking number. 请发布一个有效的跟踪号码。 I tried random one - 3974937493 and want to suggest another xpath: 我尝试了随机-3974937493,并建议另一个xpath:
HtmlTable table = (HtmlTable) page2.getFirstByXPath("//div[@id='MainContent']//table//table");
After that, parse rows of the table as usual 之后,照常解析表中的行
if (table.getCellAt(1,0) != null) System.out.println(table.getCellAt(1,0).asText();
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.