简体繁体中英

apache solr for translated documents indexing

原文 2020-06-27 20:27:43 7 1 solr

does Apache solr allow this:

The possibility of returning to the user, in addition to the document translated into French, the original text as well as the contexts of use in the original text?

the documents to be indexed is a pdf files.

ُEdit: Add example

i have the original document doc_eng.pdf and the translated document doc_fr.pdf

when the doc_fr.pdf is return in a query response i want to be able to to get doc_eng.pdf also with the context (highlighting) if it is possible

My suggestion

1- map doc_fr.pdf and doc_eng.pdf to the same id (if this can be done) and add a boolean field isOriginal =true|false.

2- use nested documents (but i dont get how this will work with pdf files)

1 answers

Yes, solr can do this. I would suggest you to use apache tika mechanism

Solr can identify languages and map text to language-specific fields during indexing using the langid UpdateRequestProcessor.

Solr supports two implementations of this feature:

Tika's language detection feature

[LangDetect language detection]( https://github.com/shuyo/language-detection https://lucene.apache.org/solr/guide/7_2/language-analysis.html )

Refer

Translator

Apache nutch not indexing all documents to apache solr

How to retain HTML coding while indexing HTML documents to Apache Solr?

Indexing markdown documents for full text search in Apache SOLR

Solr Indexing Time of Documents

SOLR 6 - indexing documents

Apache Solr: Can apache solr be used as a third part system for indexing and searching for documents from different websites?

Indexing in Apache Solr

Apache Solr PDF indexing

indexing MySQL in Apache Solr

Indexing solr documents in asynchronous mode

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Apache nutch not indexing all documents to apache solr How to retain HTML coding while indexing HTML documents to Apache Solr? Indexing markdown documents for full text search in Apache SOLR Solr Indexing Time of Documents SOLR 6 - indexing documents Apache Solr: Can apache solr be used as a third part system for indexing and searching for documents from different websites? Indexing in Apache Solr Apache Solr PDF indexing indexing MySQL in Apache Solr Indexing solr documents in asynchronous mode

Related Tags

apache solr for translated documents indexing

Question

1 answers

solution1 1 ACCPTED 2020-06-27 20:44:44

solution1
1 ACCPTED 2020-06-27 20:44:44