简体   繁体   English

在弹性搜索Java API中以某种条件获取文档

[英]Get document on some condition in elastic search java API

As I know we can parse document in elastic search, And when we search for a keyword, It will return the document using this code of java API:- 据我所知,我们可以在弹性搜索中解析文档,并且当我们搜索关键字时,它将使用以下Java API代码返回文档:

  org.elasticsearch.action.search.SearchResponse searchHits =  node.client()
            .prepareSearch()
            .setIndices("indices")
            .setQuery(qb)
            .setFrom(0).setSize(1000)
            .addHighlightedField("file.filename")
            .addHighlightedField("content")
            .addHighlightedField("meta.title")
            .setHighlighterPreTags("<span class='badge badge-info'>")
            .setHighlighterPostTags("</span>")
            .addFields("*", "_source")
            .execute().actionGet();

Now my question is, suppose some documents have string like these:- 现在我的问题是,假设某些文档具有如下字符串:

Jun 2010 to Sep 2011                First Document          
Jun 2009 to Aug 2011                Second Document             
Nov 2011 – Sep 2012                 Third Document   
Nov  2012- Sep 2013                 Forth Document   
Nov 2013 – Current                  First Document   
June 2014 – Feb 2015                Third Document   
Jan 2013 – Jan 2014                 Second Document   
July 2008 – Oct 2012                First Document   
May 2007 – Current                  Forth Document   

Now i want those documents who comes between these conditions:- 现在,我希望出现以下情况的文件:

1 to 12 months
13-24 months
26-48 months

How i can do this? 我该怎么做?

When indexing documents in this form, Elasticsearch will not be able to parse those strings as dates correctly. 以这种形式索引文档时,Elasticsearch将无法正确地将这些字符串解析为日期。 In case you transformed those strings to correctly formatted timestamps , the only way you could perform the query you propose is to index those documents in this format 如果您将这些字符串转换为正确格式的时间戳记 ,则可以执行建议的查询的唯一方法是以这种格式索引这些文档

{
  "start": "2010-09",
  "end": "2011-10",
  // rest of the document
}

and subsequently run a script-filtered query over them, compiling a script that calculates the difference between those two dates with one of the scripting languages Elasticsearch provides. 然后对它们进行脚本过滤的查询,然后使用Elasticsearch提供的一种脚本语言编译一个脚本,计算这两个日期之间的差值。 Bear in mind that script filtering and scoring is always much slower than a simple index lookup. 请记住,脚本过滤和评分总是比简单的索引查找慢得多。

A much faster and cleaner way to do this is to index the duration of the period alongside the start and end dates, like so 一种更快,更清洁的方法是,将时间段的长短与开始日期和结束日期一起编制索引,就像这样

{
  "start": "2010-09",
  "end": "2011-10",
  "duration": 13
  // the rest of the document
}

If you index your documents in this form, you can simply perform a filtered query on the duration field: 如果以这种形式索引文档,则只需在工期字段上执行过滤查询:

{
   "query":{
      "filtered":{
         "filter":{
            "and":[
               {
                  "range":{
                     "duration":{
                        "gte":1
                     }
                  }
               },
               {
                  "range":{
                     "duration":{
                        "lte":12
                     }
                  }
               }
            ]
         }
      }
   }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM