solr pdf search highlighting issue

Question

solr v6.5:- I have 2 pdf files indexed in a solr core. When I search for a keyword it is getting found in the document, however, the highlighting works for one document and not the other. For ex: when I search for "panic" which is there in one of the documents. I get the search result with highlighting. But when I search for "epsilon", I get a result that says it has been found with the document information etc, however, the highlighting for this document is not working. Heres whats been added/changed in managed_schema.xml:

    .
    .
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
.
. 
    <field name="_text_" type="text_general" multiValued="true" indexed="true" stored="true"/>
    <field name="content" type="text_general" multiValued="true" indexed="true" stored="true"/>
    .
    .
    <copyField source="content" dest="_text_"/>

And, solrconfig.xml snippet is as follows:

.
.
<requestHandler name="/update/extract"
                  startup="lazy"
                  class="solr.extraction.ExtractingRequestHandler" >
    <lst name="defaults">
      <str name="lowernames">true</str>
      <str name="fmap.meta">ignored_</str>
      <str name="fmap.content">_text_</str>
    </lst>
  </requestHandler>
.
.

Answer 1

Used the

hl.maxAnalyzedChars=aLargeEnoughValue

parameter in the query and it gives me highlighting for search words which are farther down the doc. The default value for this parameter is 51200.

Take-away: Large docs when indexed in Solr would give +ve results for SEARCH, however, highlighting could be null/nothing. This happens if the word searched for is farther down the document. Simply increasing the value of hl.maxAnalyzedChars does the job.

solr pdf search highlighting issue

Question

1 answers

solution1
0 2017-04-25 13:43:28

solr pdf search highlighting issue

Question

1 answers

solution1 0 2017-04-25 13:43:28

solution1
0 2017-04-25 13:43:28