如何将整个本地硬盘索引到Apache Solr？

Question

Solr或提供给Solr的客户端库是否有索引整个硬盘的好方法？ 这应包括zip文件中的内容，包括递归zip文件中的zip文件吗？

这应该能够在Linux上运行（没有仅Windows客户端）。

当然，这将涉及从根目录（或实际上的任何文件夹）对整个文件系统进行一次扫描。 在这一点上，我不关心保持索引为最新状态，而只是最初创建它。 这将类似于旧的“ Google桌面”应用，但该应用已停止使用。

Answer 1

您可以使用SolrJ API操纵Solr。

以下是API文档： http : //lucene.apache.org/solr/4_0_0/solr-solrj/index.html

这是有关如何使用SolrJ索引硬盘驱动器上的文件的文章。
http://blog.cloudera.com/blog/2012/03/indexing-files-via-solr-and-java-mapreduce/

文件由InputDocument表示，您可以使用.addField附加要在以后搜索的字段。

以下是索引驱动程序的示例代码：

public class IndexDriver extends Configured implements Tool {     

  public static void main(String[] args) throws Exception {
    //TODO: Add some checks here to validate the input path
    int exitCode = ToolRunner.run(new Configuration(),
     new IndexDriver(), args);
    System.exit(exitCode);
  }

  @Override
  public int run(String[] args) throws Exception {
    JobConf conf = new JobConf(getConf(), IndexDriver.class);
    conf.setJobName("Index Builder - Adam S @ Cloudera");
    conf.setSpeculativeExecution(false);

    // Set Input and Output paths
    FileInputFormat.setInputPaths(conf, new Path(args[0].toString()));
    FileOutputFormat.setOutputPath(conf, new Path(args[1].toString()));
    // Use TextInputFormat
    conf.setInputFormat(TextInputFormat.class);

    // Mapper has no output
    conf.setMapperClass(IndexMapper.class);
    conf.setMapOutputKeyClass(NullWritable.class);
    conf.setMapOutputValueClass(NullWritable.class);
    conf.setNumReduceTasks(0);
    JobClient.runJob(conf);
    return 0;
  }
}

阅读该文章以获取更多信息。

压缩文件以下是有关处理压缩文件的信息：使用Solr CELL的ExtractingRequestHandler从包格式中索引/提取文件

Solr似乎无法处理zip文件存在一些错误，这是带有修复程序的错误报告： https : //issues.apache.org/jira/browse/SOLR-2416

如何将整个本地硬盘索引到Apache Solr？

问题描述

1 个解决方案

解决方案1
2 2013-10-10 02:03:58

如何将整个本地硬盘索引到Apache Solr？

问题描述

1 个解决方案

解决方案1 2 2013-10-10 02:03:58

解决方案1
2 2013-10-10 02:03:58