简体   繁体   English

Apache Solr:如何从其他服务器访问文件并建立索引

[英]Apache Solr : How to access and index files from another server

Solr version :: 6.6.1 Solr版本:: 6.6.1

I am new to the Apache Solr and currently exploring how to use this technology to search in the PDF files. 我是Apache Solr的新手,目前正在探索如何使用该技术在PDF文件中进行搜索。

https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html#the-tikaentityprocessor https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html#the-tikaentityprocessor

I am able to index the PDF files using the "BinFileDataSource" for the PDF files within the same server as shown in the below example. 我可以使用“ BinFileDataSource”为同一服务器内的PDF文件编制PDF文件的索引,如下例所示。

Now i want to know if there is a way to change the baseDir pointing to the folder present under a different server. 现在,我想知道是否有一种方法可以更改指向不同服务器下存在的文件夹的baseDir。

Please suggest an example to access the PDF files from another server. 请提出一个示例,以从其他服务器访问PDF文件。 How will i write the path in the baseDir attribute. 我将如何在baseDir属性中写入路径。

<dataConfig>
  <dataSource type="BinFileDataSource"/> <!--Local filesystem-->
  <document>
    <entity name="K2FileEntity" processor="FileListEntityProcessor" dataSource="null"
            recursive = "true"
            baseDir="C:/solr-6.6.1/server/solr/core_K2_Depot/Depot" fileName=".*pdf" rootEntity="false">

            <field column="file" name="id"/>
            <field column="fileLastModified" name="lastmodified" />

              <entity name="pdf" processor="TikaEntityProcessor" onError="skip"
                      url="${K2FileEntity.fileAbsolutePath}" format="text">

                <field column="title" name="title" meta="true"/>
                <field column="dc:format" name="format" meta="true"/>
                <field column="text" name="text"/>

              </entity>
    </entity>
  </document>
</dataConfig>

I finally found the answer from the solr-user mailing list. 我终于从solr-user邮件列表中找到了答案。

Just change the baseDir to the folder present on another server (SMB paths works directly): 只需将baseDir更改为另一台服务器上存在的文件夹(SMB路径可直接使用):

baseDir="\\CLDServer2\RemoteK2Depot"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM