如何使用crawler4j解析文档

Question

I wanted to parse all the documents containing some text I enter as "query" using crawler4j in Eclipse. 我想解析所有包含某些文本的文档，这些文本是我在Eclipse中使用crawler4j作为“查询”输入的。

Any ideas? 有任何想法吗？

Answer 1

Not really a "direct" answer, but I also played with crawling these last few days. 这并不是一个真正的“直接”答案，但我最近几天也在爬网。 I looked first at Crawler4J, then stumbled on JSoup . 我先看了Crawler4J，然后偶然发现了JSoup 。 Did not play much with the crawler, but jSoup turns out to be quite an easy tool for parsing. 在搜寻器上玩的不多，但是jSoup事实证明是一个非常简单的解析工具。 Hence my suggestion. 因此，我的建议。 I guess crawler is good if you really need to crawl a part of the web. 我想如果您确实需要爬网的一部分，那么爬网程序就很好。 But JSoup really seems to shine as a good parser. 但是JSoup确实看起来像是一个很好的解析器。 Similar to JQuery in terms of selecting nodes etc... So perhaps use the crawler for first collecting documents, then parse them using JSoup. 在选择节点等方面类似于JQuery ...因此，也许使用搜寻器首先收集文档，然后使用JSoup解析它们。 Here's a quick example: 这是一个简单的示例：

    Document doc = Jsoup.connect("http://example.com").userAgent("Mozilla").timeout(5000)
            .get();
    Elements els = doc.select("li");

如何使用crawler4j解析文档

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-03-20 16:31:57

如何使用crawler4j解析文档

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-03-20 16:31:57

解决方案1
0 已采纳 2015-03-20 16:31:57