从Android应用程序中抓取网页的最佳方法是什么

Question

我正在android应用程序上工作，以从html网页获取一些数据并将其解析为在应用程序中使用。 我尝试使用网络采集，但似乎与android不完全兼容。 该应用程序应该获取网页，对其进行解析，获取所需的数据，然后在应用程序中使用它。 那么，什么是在android中抓取html页面的标准和推荐方法？

Answer 1

我一直很高兴使用TagSoup和XOM在Android上解析网页。 两者都在您的类路径中，您将执行以下操作：

XMLReader tagsoup = XMLReaderFactory.createXMLReader("org.ccil.cowan.tagsoup.Parser");
Builder bob = new Builder(tagsoup);
Document html = bob.build("http://www.yahoo.com");
Nodes images = html.query("//img");

for (int index = 0; index < images.size(); index++) {
    Element image = (Element) images.get(index);
    String src = image.getAttribute("src").getValue();
    // do something with it...
}

如果您要抓取的HTML具有名称空间，则应改为执行以下操作：

XPathContext context = new XPathContext("html", "http://www.w3.org/1999/xhtml");
Nodes images = html.query("//html:img", context);

链接：

XOM-> http://www.xom.nu

TagSoup-> http://ccil.org/~cowan/XML/tagsoup/

当然，您必须从网页构建XML文档时捕获可能的异常。

从Android应用程序中抓取网页的最佳方法是什么

问题描述

1 个解决方案

解决方案1
1 2012-06-02 23:35:11

从Android应用程序中抓取网页的最佳方法是什么

问题描述

1 个解决方案

解决方案1 1 2012-06-02 23:35:11

解决方案1
1 2012-06-02 23:35:11