简体   繁体   中英

Create list of custom stop words in elastic search using java

To enhance my search result obtained from elastic search I want to increase my stop word library from my java code. Till now , I am using the default list of stop analyzer which do not have the interrogative words in list like What,Who,Why etc. We want to remove these words and some additional words from our search when querying for result. I have tried code from here(the last ans) tried

PUT /my_index
{
"settings": {
"analysis": {
  "analyzer": {
    "my_analyzer": { 
      "type": "standard", 
      "stopwords": [ "and", "the" ] 
    }
  }
}

} }

This code in java. But It wasn' working for me. Important Query

How to create our own list of stopwords and how to implement it in our code with query

QueryStringQueryBuilder qb=new QueryStringQueryBuilder(text).analyzer("stop");
            qb.field("question_title");
            qb.field("level");
            qb.field("category");
            qb.field("question_tags");
            SearchResponse response = client.prepareSearch("questionindex")
            .setSearchType(SearchType.QUERY_AND_FETCH)
            .setQuery(qb)
            .execute()
            .actionGet();
            SearchHit[] results = response.getHits().getHits();
            System.out.println("respose-"+results.length);

Currently I am using default stop analyzer. Which just stop a limited stop words like

"a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"

But I want to increase this library.

You're on the right track. In your first listing ( from the documentation about stopwords ) you created a custom analyzer called my_analyzer for the index called my_index which will have the effect of removes "and" and "the" from text that you use my_analyzer with.

Now to actually use it, you should:

  1. Make sure that you've defined my_analyzer on the index you're querying ( questionindex ?)
  2. Create a mapping for your documents that uses my_analyzer for the fields where you would like to remove "and" and "the" (for example the question_title field):
  3. Test out your analyzer using the Analyze API

    GET /questionindex/_analyze?field=question.question_title&text=No quick brown fox jumps over my lazy dog and the indolent cat

  4. Reindex your documents


Try this as a starting point:

POST /questionindex
{
    "settings" : {
        "analysis": {
            "analyzer": {
                "my_analyzer": { 
                    "type": "standard", 
                    "stopwords": [ "and", "the" ] 
                }
            }
        }
    },
    "mappings" : {
        "question" : {
            "properties" : {
                "question_title" : { 
                    "type" : "string", 
                    "analyzer" : "my_analyzer" 
                },
                "level" : { 
                    "type" : "integer" 
                },
                "category" : { 
                    "type" : "string" 
                },
                "question_tags" : { 
                    "type" : "string" 
                }
            }
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM