简体   繁体   中英

apache solr for translated documents indexing

does Apache solr allow this:

The possibility of returning to the user, in addition to the document translated into French, the original text as well as the contexts of use in the original text?

the documents to be indexed is a pdf files.

ُEdit: Add example

i have the original document doc_eng.pdf and the translated document doc_fr.pdf

when the doc_fr.pdf is return in a query response i want to be able to to get doc_eng.pdf also with the context (highlighting) if it is possible

My suggestion

1- map doc_fr.pdf and doc_eng.pdf to the same id (if this can be done) and add a boolean field isOriginal =true|false.

2- use nested documents (but i dont get how this will work with pdf files)

Yes, solr can do this. I would suggest you to use apache tika mechanism

Solr can identify languages and map text to language-specific fields during indexing using the langid UpdateRequestProcessor.

Solr supports two implementations of this feature:

Tika's language detection feature

[LangDetect language detection]( https://github.com/shuyo/language-detectionhttps://lucene.apache.org/solr/guide/7_2/language-analysis.html )

Refer

Translator

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM