简体   繁体   中英

solr pdf search highlighting issue

solr v6.5:- I have 2 pdf files indexed in a solr core. When I search for a keyword it is getting found in the document, however, the highlighting works for one document and not the other. For ex: when I search for "panic" which is there in one of the documents. I get the search result with highlighting. But when I search for "epsilon", I get a result that says it has been found with the document information etc, however, the highlighting for this document is not working. Heres whats been added/changed in managed_schema.xml:

    .
    .
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
.
. 
    <field name="_text_" type="text_general" multiValued="true" indexed="true" stored="true"/>
    <field name="content" type="text_general" multiValued="true" indexed="true" stored="true"/>
    .
    .
    <copyField source="content" dest="_text_"/>

And, solrconfig.xml snippet is as follows:

.
.
<requestHandler name="/update/extract"
                  startup="lazy"
                  class="solr.extraction.ExtractingRequestHandler" >
    <lst name="defaults">
      <str name="lowernames">true</str>
      <str name="fmap.meta">ignored_</str>
      <str name="fmap.content">_text_</str>
    </lst>
  </requestHandler>
.
.

Used the

hl.maxAnalyzedChars=aLargeEnoughValue

parameter in the query and it gives me highlighting for search words which are farther down the doc. The default value for this parameter is 51200.

Take-away: Large docs when indexed in Solr would give +ve results for SEARCH, however, highlighting could be null/nothing. This happens if the word searched for is farther down the document. Simply increasing the value of hl.maxAnalyzedChars does the job.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM