Solr File Indexing map content by pages

Question

I would like to index files in Solr. I have already made an "output script" with PHP, but my project leader has given me the task of displaying the page number of the found text.

So: - I am searching for the Word "Foo". - Solr returns the results and also the highlighted text. - Now I would like to know on which page this highlighted text is, to find it.

The files are *.pdf files.

One solution I have thought of would be to import the Text of the PDF Files in different fields? Or maybe in this one multivalued field named "content".

Maybe like this:

Json:
    content:
        1: "page one text",
        2: "page two text"

and so on?

Is this possible? Or is there a better way to find this information out? Thanks for your help! :-)

Answer 1

You need to create a separate Solr document for every page of every PDF file. If you want to return only one result per file, then you can use FieldCollapsing to group all the results from the same PDF file.

Solr File Indexing map content by pages

Question

1 answers

solution1
0 2013-04-06 07:45:58

Solr File Indexing map content by pages

Question

1 answers

solution1 0 2013-04-06 07:45:58

solution1
0 2013-04-06 07:45:58