简体   繁体   English

Elasticsearch未返回所有相关结果

[英]Elasticsearch not returning all relevant results

I am using elastic search to search for files stored in MongoDB. 我正在使用弹性搜索来搜索存储在MongoDB中的文件。 I would like to retrieve all files whose name match a pattern. 我想检索名称与模式匹配的所有文件。 When I queried in MongoDB it returns 6754 files. 当我在MongoDB中查询时,它返回6754个文件。

FSsearch:PRIMARY> db.fs.files.find({"filename":/.*Mail.*/}).count();

6754

But when I tried to do the same with elastic search it return only 85 files. 但是,当我尝试对弹性搜索执行相同操作时,它仅返回85个文件。 Any way to get all the files in elastic search? 有什么办法可以在弹性搜索中获取所有文件?

curl -XGET "localhost:9200/submission_idx/files/_search?search_type=scan&scroll=10m&size=7000&pretty=1" -d '{"query" : {
"field" : {
        "filename" : "*Mail*"
    }                           
}                            
}'

{
  "_scroll_id" : "c2Nhbjs1OzIyMDpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxODpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNjpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxOTpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNzpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzE7dG90YWxfaGl0czo4NTs=",
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 85,
    "max_score" : 0.0,
    "hits" : [ ]
  }
}

You can use the Regexp Filter (or Regexp Query ): 您可以使用Regexp过滤器 (或Regexp Query ):

{
    "filtered": {
        "query": {
            "match_all": {}
        },
        "filter": {
            "regexp":{
                "filename" : "*mail*"
            }
        }
    }
}

Notice the lower-case "m" on mail. 注意邮件中的小写字母“ m”。 By default, Elasticsearch analyzes all fields with a lower case tokenizer. 默认情况下,Elasticsearch使用小写的标记符分析所有字段。 Consequently, when searching for capitalized "Mail," Elasticsearch will exclude all analyzed fields from the returned results. 因此,当搜索大写的“邮件”时,Elasticsearch将从返回的结果中排除所有分析的字段。 You can turn the default lower-case tokenizer off by marking a field as "not_analyzed" or by creating your own custom analyzer. 您可以通过将字段标记为“ not_analyzed”或创建自己的自定义分析器来关闭默认的小写标记器。

Also, be aware that using wildcards, especially at the beginning of a query, can be very slow and memory-consuming when searching on large datasets. 另外,请注意,在大型数据集上进行搜索时,尤其是在查询开始时使用通配符可能会非常缓慢且占用大量内存。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM