简体   繁体   English

Tika Solr元数据映射忽略文档标题

[英]Tika Solr Metadata mapping ignore document title

I have the following config file for solr: 我有以下用于solr的配置文件:

  <requestHandler name="/update/extract" 
                  startup="lazy"
                  class="solr.extraction.ExtractingRequestHandler" >
    <lst name="defaults">
      <!-- All the main content goes into "text"... if you need to return
           the extracted text or do highlighting, use a stored field. -->
      <str name="lowernames">true</str>
      <str name="fmap.content">content</str>
      <str name="fmap.application_name">type</str>
      <str name="fmap.content_type">mime</str>
      <str name="fmap.stream_size">size</str>
      <str name="uprefix">ignored_</str>
      <str name="captureAttr">false</str>
    </lst>
  </requestHandler>

and this is my schema: 这是我的架构:

   <field name="id" type="string" indexed="true" stored="true" required="true" /> 
   <field name="access_type" type="string" indexed="true" stored="false"/>
   <field name="access_restriction" type="string" indexed="true" stored="false" multiValued="true"/>
   <field name="title" type="string" indexed="true" stored="true" multiValued="true" />
   <field name="tags" type="string" indexed="true" stored="true" multiValued="true"/>
   <field name="content" type="text_en_splitting" indexed="true" stored="true"/>
   <field name="created" type="date" indexed="true" stored="true"/>
   <field name="createdby" type="string" indexed="true" stored="true"/>
   <field name="modified" type="date" indexed="true" stored="true"/>
   <field name="modifiedby" type="string" indexed="true" stored="true"/>
   <field name="source" type="string" indexed="true" stored="true" />
   <field name="version" type="string" indexed="true" stored="true" />
   <field name="resourcelink" type="string" indexed="true" stored="true" />
   <field name="downloadlink" type="string" indexed="true" stored="true" />

   <field name="type" type="string" indexed="true" stored="true" />
   <field name="mime" type="string" indexed="true" stored="true" />
   <field name="size" type="string" indexed="true" stored="true" />

I want to set the title myself. 我想自己设定title But Tika keeps setting it's own title (that's why I set multiValued="true" temporarily), which I find strange because I have to manually map stuff like stream_size and content_type . 但是Tika一直设置自己的title (这就是为什么我暂时设置multiValued="true"的原因),我觉得很奇怪,因为我必须手动映射诸如stream_sizecontent_type类的东西。

What solution is possible to this issue? 有什么解决方案可以解决这个问题?

I'd like Tika to override the title I assign, like this: 我希望Tika覆盖我分配的title ,如下所示:

I have 3 documents, for one of those, Tika doesn't extract a title , in this case, I have my own title I set passing literal.title , when Tika does extract a title , I want it to override the one I passed in literal.title . 我有3个文档,对于其中一个,Tika不会提取title ,在这种情况下,我有自己的标题,我设置了传递literal.title ,当Tika提取title ,我希望它覆盖我传递的那个titleliteral.title Is this possible? 这可能吗?

一段时间之前,我正在处理同一问题,但是我也碰到了墙:(我让Tika取“ title”,并使用literal.other_title_like_field存储适当的标题。这不是最佳解决方案,但对我有用。

For those who are still struggling with this problem, I solved it by adding 对于那些仍在努力解决此问题的人,我通过添加解决了

<str name="fmap.title">ignored_</str>

in my ExtractingRequestHandler defaults. 在我的ExtractingRequestHandler默认值中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM