No “content” field created when indexing PDF with solr

Question

I have succesfully indexed PDF's using the POST command as described in the following link: http://makble.com/how-to-extract-text-from-pdf-and-post-into-solr

Terms stored within an indexed PDF file can be queried and can be found using general queries or the text field.

However, I do not see the "content" field as generated as I can with the other PDF related fields. I tried editing the managed-schema file to add the fields:

<field name="content" type="text_general" indexed="false" stored="true" multiValued="true"/>

<copyField source="content" dest="text"/>

I get the following error when I attemp to reload the core:

<str name="msg">Error handling 'reload' action</str>
<str name="trace">
org.apache.solr.common.SolrException: Error handling 'reload' action at org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$2(CoreAdminOperation.java:110) at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:370) at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:388) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:174)

My solrconfig.xml has this:

<requestHandler name="/update/extract"
                  startup="lazy"
                  class="solr.extraction.ExtractingRequestHandler" >
    <lst name="defaults">
      <str name="lowernames">true</str>
      <str name="fmap.meta">ignored_</str>
      <str name="fmap.content">_text_</str>
    </lst>
  </requestHandler>

I would like to have the "content" field available to perform search only for the text located within the indexed pdf files.

Answer 1

1) Do not manually edit the schema file. Instead use the Schema API .

2) fmap.content maps the content field to the _text_ field in your case. If you have a content field already defined, then just removing this particular parameter from the ExtractingRequestHandler definition should do the job.

No “content” field created when indexing PDF with solr

Question

1 answers

solution1
0 2017-05-30 06:42:10

No “content” field created when indexing PDF with solr

Question

1 answers

solution1 0 2017-05-30 06:42:10

solution1
0 2017-05-30 06:42:10