简体   繁体   中英

What's the best way to scrape web page from android application

I am working on android application to get some data from html webpage and parse it to be used in the application. I tried to use Web-harvest, but it seems not fully compatible with android. The Application should get the webpage, parse it, get the needed data, and use it in the app. so whats the standard and recommended way to scrape html pages in android ?

I've been happy with using TagSoup and XOM to parse webpages on Android. With both in your classpath, you'd do something like:

XMLReader tagsoup = XMLReaderFactory.createXMLReader("org.ccil.cowan.tagsoup.Parser");
Builder bob = new Builder(tagsoup);
Document html = bob.build("http://www.yahoo.com");
Nodes images = html.query("//img");

for (int index = 0; index < images.size(); index++) {
    Element image = (Element) images.get(index);
    String src = image.getAttribute("src").getValue();
    // do something with it...

If the HTML you're scraping has a namespace, you'd do the below instead:

XPathContext context = new XPathContext("html", "http://www.w3.org/1999/xhtml");
Nodes images = html.query("//html:img", context);


XOM --> http://www.xom.nu

TagSoup --> http://ccil.org/~cowan/XML/tagsoup/

Of course, you'll have to catch possible exceptions on building the XML document from the Web page.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM