[英]Tika Solr Metadata mapping ignore document title
I have the following config file for solr: 我有以下用于solr的配置文件:
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<!-- All the main content goes into "text"... if you need to return
the extracted text or do highlighting, use a stored field. -->
<str name="lowernames">true</str>
<str name="fmap.content">content</str>
<str name="fmap.application_name">type</str>
<str name="fmap.content_type">mime</str>
<str name="fmap.stream_size">size</str>
<str name="uprefix">ignored_</str>
<str name="captureAttr">false</str>
</lst>
</requestHandler>
and this is my schema: 这是我的架构:
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="access_type" type="string" indexed="true" stored="false"/>
<field name="access_restriction" type="string" indexed="true" stored="false" multiValued="true"/>
<field name="title" type="string" indexed="true" stored="true" multiValued="true" />
<field name="tags" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="content" type="text_en_splitting" indexed="true" stored="true"/>
<field name="created" type="date" indexed="true" stored="true"/>
<field name="createdby" type="string" indexed="true" stored="true"/>
<field name="modified" type="date" indexed="true" stored="true"/>
<field name="modifiedby" type="string" indexed="true" stored="true"/>
<field name="source" type="string" indexed="true" stored="true" />
<field name="version" type="string" indexed="true" stored="true" />
<field name="resourcelink" type="string" indexed="true" stored="true" />
<field name="downloadlink" type="string" indexed="true" stored="true" />
<field name="type" type="string" indexed="true" stored="true" />
<field name="mime" type="string" indexed="true" stored="true" />
<field name="size" type="string" indexed="true" stored="true" />
I want to set the title
myself. 我想自己设定
title
。 But Tika keeps setting it's own title
(that's why I set multiValued="true"
temporarily), which I find strange because I have to manually map stuff like stream_size
and content_type
. 但是Tika一直设置自己的
title
(这就是为什么我暂时设置multiValued="true"
的原因),我觉得很奇怪,因为我必须手动映射诸如stream_size
和content_type
类的东西。
What solution is possible to this issue? 有什么解决方案可以解决这个问题?
I'd like Tika to override the title
I assign, like this: 我希望Tika覆盖我分配的
title
,如下所示:
I have 3 documents, for one of those, Tika doesn't extract a title
, in this case, I have my own title I set passing literal.title
, when Tika does extract a title
, I want it to override the one I passed in literal.title
. 我有3个文档,对于其中一个,Tika不会提取
title
,在这种情况下,我有自己的标题,我设置了传递literal.title
,当Tika提取title
,我希望它覆盖我传递的那个title
在literal.title
。 Is this possible? 这可能吗?
一段时间之前,我正在处理同一问题,但是我也碰到了墙:(我让Tika取“ title”,并使用literal.other_title_like_field存储适当的标题。这不是最佳解决方案,但对我有用。
For those who are still struggling with this problem, I solved it by adding 对于那些仍在努力解决此问题的人,我通过添加解决了
<str name="fmap.title">ignored_</str>
in my ExtractingRequestHandler defaults. 在我的ExtractingRequestHandler默认值中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.