繁体 English 中英

我可以使用Nutch爬网，存储在Cassandra中，使用Solr进行索引吗？

[英]Can I crawl with Nutch, store in Cassandra, index using Solr?

原文 2014-01-01 13:39:35 0 3 solr/ cassandra/ nutch

我正在开发关键字分析应用程序。 我希望使用Nutch爬网，使用Solr索引输出，最后将数据存储在Cassandra中。

稍后，我应该能够在Solr上进行搜索查询和分析，并且它必须从Cassandra获取相关数据。

这种设置可行吗？ 如果是，我有什么要记住的吗？

3 个解决方案

如果使用Datastax的Cassandra，则将Cassandra表索引到Solr中要容易得多。 这是http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-solr上的链接

我认为可以，但是我不是Cassandra用户，所以从未尝试过。

您必须配置gora.properties（ http://svn.apache.org/repos/asf/nutch/tags/release-2.2.1/conf/gora.properties ）才能启用Cassandra。 在《 Nutch 2教程》（ http://wiki.apache.org/nutch/Nutch2Tutorial ）中，该操作适用于HBase。

要了解Cassandra中的数据映射位置，您需要查看http://svn.apache.org/repos/asf/nutch/tags/release-2.2.1/conf/gora-cassandra-中的映射mapping.xml

Nutch将数据存储在Cassandra中。 关于Solr我不知道（我从未使用过Solr）。

以编程方式可能的....您可以从solr索引中获取结果...在cassandra和Solr中都保留唯一的ID ...从solr中获取该ID并从cassandra中获取整个结果.....

我们可以使用nutch和solr抓取和索引Google云端硬盘文档吗？

[英]Can we crawl and index Google Drive documents using nutch and solr?

使用nutch抓取图像及其元数据并将其索引到solr中

[英]Crawl image and their metadata using nutch and index them into solr

单抓取脚本抓取网站（Nutch）和索引结果（Solr）

[英]Single Crawl script to Crawl website (Nutch) and Index results (Solr)

将Nutch正则表达式文件分离以爬网并索引到多个Solr核心

[英]Separate Nutch regex files to crawl and index to multiple Solr cores

带有Solr 3.4的Nutch 1.4-无法抓取网址，“没有要提取的网址”

[英]Nutch 1.4 with Solr 3.4 - can't crawl URL, “no URLs to fetch”

使用 Apache Solr 索引 Nutch 数据

[英]Using Apache Solr to index Nutch data

通过螺母更新Solr索引

[英]update solr index by nutch

使用Apache Nutch抓取图像

[英]Crawl Image using Apache Nutch

使用Nutch履带与Solr

[英]Using Nutch crawler with Solr

如何使用Solr和Nutz自动索引数据？

[英]How to auto-index data using solr and nutch?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我们可以使用nutch和solr抓取和索引Google云端硬盘文档吗？使用nutch抓取图像及其元数据并将其索引到solr中单抓取脚本抓取网站（Nutch）和索引结果（Solr）将Nutch正则表达式文件分离以爬网并索引到多个Solr核心带有Solr 3.4的Nutch 1.4-无法抓取网址，“没有要提取的网址” 使用 Apache Solr 索引 Nutch 数据通过螺母更新Solr索引使用Apache Nutch抓取图像使用Nutch履带与Solr 如何使用Solr和Nutz自动索引数据？

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM