I've successfully implemented stemming for elasticsearch and thus when I search for "code" I hit upon "codes" and "coding" etc.
My problem arises when I try to make use of the "must_not" field in my queries. When I include "code" in the "must_not" field, it's fine and I still get my results as expected but when I search for "codes" I don't get back any results even though there are documents which have the word "codes" in them for sure.
My query is as follows:
for(i = 0; i < exclude_words.length; i++)
{
must_not.push({term:{text:exclude_words[i].toLowerCase()}});
}
query = {
"filtered": {
"query": {
"dis_max": {
"queries": [
{"match": {"text": term}},
{"match": {"title": term}}
]
}
},
"filter": {
"bool": {
"must_not": must_not
}
}
}
}
I'm using the elasticsearch api for node.js to construct my queries and get results from elasticsearch.
I'm assuming I'm having this problem because of stemming and that "codes" is stored as "code" in the search index.
Is there a way to solve this without using an external algorithm to stem my queries as well? Or is there an elegant way to solve this issue?
Any help is much appreciated!
Update
This is my analyzer:
{
"settings": {
"analysis": {
"analyzer": {
"stopword_analyzer": {
"type": "snowball",
"stopwords": ["a", "able", "about", "across", "after", "all", "almost", "also", "am", "among", "an", "and", "any", "are", "as", "at", "be", "because", "been", "but", "by", "can", "cannot", "could", "dear", "did", "do", "does", "either", "else", "ever","every", "for", "from", "get", "got", "had", "has", "have", "he", "her", "hers", "him", "his", "how", "however", "i", "if", "in", "into", "is", "it", "its", "just", "least", "let", "like", "may", "me", "might", "most", "must", "my", "neither", "no", "nor", "not", "of", "off", "often", "on", "only", "or", "other", "our", "own", "rather", "said", "say", "says", "she", "should", "since", "so", "some", "than", "that", "the", "their", "them", "then", "there", "these", "they", "this", "tis", "to", "too", "us", "wants", "was", "we", "were", "what", "when", "where", "which", "while", "who", "whom", "why", "will", "with", "would", "yet", "you", "your"]
}
}
}
}
The text field has the following mapping:
"text": {
"type": "string",
"analyzer": "stopword_analyzer"
}
When I include "code" in the "must_not" field, it's fine and I still get my results as expected
It's not about must_not
it's about the term
filter you use in must_not
. The term
filter will take your search text - "code" or "codes" or whatever - and it will use the exact value for filtering.
But, the analyzer you are using is changing the terms being indexed. For example, if you want to index "coding" you actually will have (as terms in the inverted index) in the index "code" . Remember that term
will actually search for exact values. So, if you search for "codes" it will not be found as the single term in your document is "code".
I suggest trying out match
instead of term
in the must_not
part as that will use the analyzer at search time as well. Something like this:
"filter": {
"bool": {
"must_not": [
{
"query": {
"match": {
"text": "codes"
}
}
},
{
"query": {
"match": {
"text": "coding"
}
}
}
]
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.