简体   繁体   English

Solr 8.4.1 云:bin/post - 找不到文件问题

[英]Solr 8.4.1 cloud : bin/post - File not Found problem

I am new to Solr and have been working through the tutorial of 8.4.0.我是 Solr 的新手,一直在学习 8.4.0 的教程。 Having followed successfully the techproducts example using SolrCloud, I'm now trying to use a schemaless approach to index some PDF files.使用 SolrCloud 成功遵循 techproducts 示例后,我现在尝试使用无模式方法来索引一些 PDF 文件。 For that, I used the following, again from the tutorial, to index several files which are stored int the ~/Documents/pdf folder:为此,我再次使用教程中的以下内容来索引存储在 ~/Documents/pdf 文件夹中的几个文件:

bin/solr create -c localpdf -s 2 - rf 2
bin/post -c localpdf ~/Documents/pdf

When executing the above, I get the following error:执行上述操作时,出现以下错误:

SimplePostTool: WARNING: Response: <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>

</head>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing /solr/localpdf/update/extract. Reason:
<pre>    Not Found</pre></p>
</body>
</html>
SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException: http://localhost:8983/solr/localpdf/update/extract?resource.name=%2Fhome%2Fuser%2FDocuments%2Fpdf%2Ftest234.pdf&literal.id=%2Fhome%2Fuser%2FDocuments%2Fpdf%2Ftest234.pdf

Running the same command with techproducts , ie running:使用techproducts运行相同的命令,即运行:

bin/post -c techproducts ~/Documents/pdf

at least finds the files (it gives me some other errors related to PDFBox and some fonts, but that's another matter)至少找到文件(它给了我一些与 PDFBox 和一些字体相关的其他错误,但这是另一回事)

I can add other files, for instance XML to localpdf from the example/exampledocs folder, but not the pdfs.我可以从 example/exampledocs 文件夹添加其他文件,例如 XML 到localpdf ,但不能添加 pdf。

What am I missing here?我在这里想念什么?

You must configure your core / collection to load the extracting request handler - otherwise it's not available.您必须配置您的核心/集合以加载提取请求处理程序- 否则它不可用。 The techproducts core does this by default. techproducts 核心默认执行此操作。 Add the jars to the list of jars to load:将罐子添加到要加载的罐子列表中:

<lib dir="${solr.install.dir:../../..}/contrib/extraction/lib" regex=".*\.jar" />
​<lib dir="${solr.install.dir:../../..}/dist/" regex="solr-cell-\d.*\.jar" />

And add the request handler definition (from the guide linked above):并添加请求处理程序定义(来自上面链接的指南):

<requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
  <lst name="defaults">
    <str name="fmap.Last-Modified">last_modified</str>
    <str name="uprefix">ignored_</str>
  </lst>
  <!--Optional.  Specify a path to a tika configuration file. See the Tika docs for details.-->
  <str name="tika.config">/my/path/to/tika.config</str>
  <!-- Optional. Specify one or more date formats to parse. See DateUtil.DEFAULT_DATE_FORMATS
       for default date formats -->
  <lst name="date.formats">
    <str>yyyy-MM-dd</str>
  </lst>
  <!-- Optional. Specify an external file containing parser-specific properties.
       This file is located in the same directory as solrconfig.xml by default.-->
  <str name="parseContext.config">parseContext.xml</str>
</requestHandler>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM