简体   繁体   English

使用java在弹性搜索中创建自定义停用词列表

[英]Create list of custom stop words in elastic search using java

To enhance my search result obtained from elastic search I want to increase my stop word library from my java code. 为了增强从弹性搜索获得的搜索结果,我想从我的java代码中增加我的停止词库。 Till now , I am using the default list of stop analyzer which do not have the interrogative words in list like What,Who,Why etc. We want to remove these words and some additional words from our search when querying for result. 到目前为止,我正在使用默认的停止分析器列表,它没有像What,Who,Why等列表中的疑问词。我们想在查询结果时从搜索中删除这些词和一些额外的词。 I have tried code from here(the last ans) tried 我曾尝试代码从这里(最后ANS) 尝试

PUT /my_index
{
"settings": {
"analysis": {
  "analyzer": {
    "my_analyzer": { 
      "type": "standard", 
      "stopwords": [ "and", "the" ] 
    }
  }
}

} } }}

This code in java. 这段代码用java。 But It wasn' working for me. 但它并没有为我工作。 Important Query 重要查询

How to create our own list of stopwords and how to implement it in our code with query 如何创建我们自己的停用词列表以及如何在我们的代码中使用查询来实现它

QueryStringQueryBuilder qb=new QueryStringQueryBuilder(text).analyzer("stop");
            qb.field("question_title");
            qb.field("level");
            qb.field("category");
            qb.field("question_tags");
            SearchResponse response = client.prepareSearch("questionindex")
            .setSearchType(SearchType.QUERY_AND_FETCH)
            .setQuery(qb)
            .execute()
            .actionGet();
            SearchHit[] results = response.getHits().getHits();
            System.out.println("respose-"+results.length);

Currently I am using default stop analyzer. 目前我正在使用默认停止分析器。 Which just stop a limited stop words like 这只是停止有限的停止词

"a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with" “a”,“an”,“and”,“are”,“as”,“at”,“be”,“but”,“by”,“for”,“if”,“in”,“into” “,”是“,”它是“,”不是“,”不是“,”,“,”,“或”,“,”,“,”,“,”,“,”,“,”,“然后”, “那里”,“这些”,“他们”,“这个”,“来”,“是”,“将”,“带”

But I want to increase this library. 但我想增加这个库。

You're on the right track. 你走在正确的轨道上。 In your first listing ( from the documentation about stopwords ) you created a custom analyzer called my_analyzer for the index called my_index which will have the effect of removes "and" and "the" from text that you use my_analyzer with. 在你的第一个上市( 约停止字的文件 )创建调用自定义分析my_analyzer被叫指数my_index这将对消除了效果“与”“的”从文本您使用my_analyzer用。

Now to actually use it, you should: 现在要实际使用它,你应该:

  1. Make sure that you've defined my_analyzer on the index you're querying ( questionindex ?) 确保你在你要查询的索引上定义了my_analyzerquestionindex ?)
  2. Create a mapping for your documents that uses my_analyzer for the fields where you would like to remove "and" and "the" (for example the question_title field): 为您要删除“and”“the”的字段(例如question_title字段)创建使用my_analyzer的文档的映射:
  3. Test out your analyzer using the Analyze API 使用Analyze API测试您的分析仪

    GET /questionindex/_analyze?field=question.question_title&text=No quick brown fox jumps over my lazy dog and the indolent cat

  4. Reindex your documents 重新索引您的文档


Try this as a starting point: 以此为出发点:

POST /questionindex
{
    "settings" : {
        "analysis": {
            "analyzer": {
                "my_analyzer": { 
                    "type": "standard", 
                    "stopwords": [ "and", "the" ] 
                }
            }
        }
    },
    "mappings" : {
        "question" : {
            "properties" : {
                "question_title" : { 
                    "type" : "string", 
                    "analyzer" : "my_analyzer" 
                },
                "level" : { 
                    "type" : "integer" 
                },
                "category" : { 
                    "type" : "string" 
                },
                "question_tags" : { 
                    "type" : "string" 
                }
            }
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM