从elasticsearch查询中获取指定的数组元素个数

Question

I am having an index on elasticsearch having an array in its record. 我有一个关于Elasticsearch的索引，该索引的记录中有一个数组。 Say the field name is " samples " and the array is : 说字段名称是“ samples ”，数组是：

["abc","xyz","mnp".....] [ “ABC”， “XYZ”， “MNP” .....]

So is there any query so that I could specify the no of elements to retrieve from the array . 那么是否有任何查询，以便我可以指定要从数组中检索的元素编号。 Say I want that the retrieved record should only have first 2 elements in sample array 说我希望检索到的记录在样本数组中应该只包含前2个元素

Answer 1

Assuming you have array of strings as a document. 假设您将字符串数组作为文档。 I have a couple of ideas in my mind which might help you. 我有两个想法可能会对您有所帮助。

PUT /arrayindex/
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "spacelyzer": {
            "tokenizer": "whitespace"
          },
          "commalyzer": {
            "type": "custom",
            "tokenizer": "commatokenizer",
            "char_filter": "square_bracket"
          }
        },
        "tokenizer": {
          "commatokenizer": {
            "type": "pattern",
            "pattern": ","
          }
        },
        "char_filter": {
          "square_bracket": {
            "type": "mapping",
            "mappings": [
              "[=>",
              "]=>"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "array_set": {
      "properties": {
        "array_space": {
          "analyzer": "spacelyzer",
          "type": "string"
        },
        "array_comma": {
          "analyzer": "commalyzer",
          "type": "string"
        }
      }
    }
  }
}

POST /arrayindex/array_set/1
{
  "array_space": "qwer qweee trrww ooenriwu njj"
}

POST /arrayindex/array_set/2
{
  "array_comma": "[qwer,qweee,trrww,ooenriwu,njj]"
}

The above DSL accepts two types of arrays one is a white-space separated string where every string represents an element of array and the other is a type of array that was specified by you. 上面的DSL接受两种类型的数组，一种是用空格分隔的字符串，其中每个字符串代表一个数组的元素，另一种是您指定的一种数组。 This is array is possible in Python and in python if you index such a document it is automatically converted to string ie ["abc","xyz","mnp".....] would be converted to "["abc","xyz","mnp".....]" . 这是数组，在Python中是可能的，在python中，如果您为这样的文档建立索引，它会自动转换为字符串，即["abc","xyz","mnp".....]将转换为"["abc","xyz","mnp".....]" 。

spacelyzer tokenizes according to the whitespaces and commalyzer tokenizes according to the commas and removes [ and ] from the string. spacelyzer根据空格标记化，而commalyzer根据逗号标记化， [ and ]从字符串中删除[ and ] 。

Now if you'll the Termvector API like this: 现在，如果您使用如下的Termvector API：

GET arrayindex/array_set/1/_termvector
{
  "fields" : ["array_space", "array_comma"],
  "term_statistics" : true,
  "field_statistics" : true
}

GET arrayindex/array_set/2/_termvector
{
  "fields" : ["array_space", "array_comma"],
  "term_statistics" : true,
  "field_statistics" : true
}

You can simply get the position of the element from their responses eg to find the position of "njj" use 您可以简单地从他们的响应中获取元素的位置，例如找到"njj"使用的位置

termvector_response["term_vectors"]["array_comma"]["terms"]["njj"]["tokens"][0]["position"] or, termvector_response["term_vectors"]["array_comma"]["terms"]["njj"]["tokens"][0]["position"]或，
termvector_response["term_vectors"]["array_space"]["terms"]["njj"]["tokens"][0]["position"]

Both will give you 4 which is the actual index in the array specified. 两者都会给您4 ，它是指定数组中的实际索引。 I suggest you to the whitespace type design. 我建议您进行whitespace设计。

The Python code for this can be: 用于此的Python代码可以是：

from elasticsearch import Elasticsearch

ES_HOST = {"host" : "localhost", "port" : 9200}
ES_CLIENT = Elasticsearch(hosts = [ES_HOST], timeout = 180)

def getTermVector(doc_id):
    a = ES_CLIENT.termvector\
        (index = "arrayindex",
            doc_type = "array_set",
            id = doc_id,
            field_statistics = True,
            fields = ['array_space', 'array_comma'],
            term_statistics = True)
    return a

def getElements(num, array_no):
    all_terms = getTermVector(array_no)['term_vectors']['array_space']['terms']
    for i in range(num):
        for term in all_terms:
            for jsons in  all_terms[term]['tokens']:
                if jsons['position'] == i:
                    print term, "@ index", i


getElements(3, 1)

# qwer @ index 0
# qweee @ index 1
# trrww @ index 2

从elasticsearch查询中获取指定的数组元素个数

问题描述

1 个解决方案

解决方案1
0 2015-08-19 04:31:20

从elasticsearch查询中获取指定的数组元素个数

问题描述

1 个解决方案

解决方案1 0 2015-08-19 04:31:20

解决方案1
0 2015-08-19 04:31:20