简体   繁体   中英

Apache Solr: Correct use of CompoundWordFilter

I'm trying to figure out how to best configure Solr for my app. I'm indexing (mostly german) PDF-Documents, and I'm using dismax queries to query Solr.

If a document contains the word "Firmenprofil" (a german compound word, -> 'company profile'), it will only be returned in queries for exactly that word. However, it would be desirable for queries only containing "Profil" to also return this document.

I downloaded a german dictionary file and applied a DictionaryCompoundWordTokenFilter to both the index- and the query-analyzer.

The Problem is, that the filter decomposes the query into very small parts (eg "pro" in the case of "Firmenprofil" which then results in having all sorts of documents that contain words like "Product" returned...).

I tried removing the Filter from the query-analyzer which leads to solr not finding the document at all. I also tried leaving the query-filter in, but explicitly setting the onlyLongestMatch -option to true, but that didn't seem to have any effect at all.

Ok, seems like my dictionary file was simply too big (~20mb). I replaced it with a more compact one and now it works just fine...

Without your actual config files, its a bit of a guessing game.

Did you check if profil is part of the dictionary?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM