[英]Get specified no of array elements from elasticsearch query
I am having an index on elasticsearch having an array in its record. 我有一个关于Elasticsearch的索引,该索引的记录中有一个数组。 Say the field name is " samples " and the array is :
说字段名称是“ samples ”,数组是:
["abc","xyz","mnp".....]
[ “ABC”, “XYZ”, “MNP” .....]
So is there any query so that I could specify the no of elements to retrieve from the array . 那么是否有任何查询,以便我可以指定要从数组中检索的元素编号。 Say I want that the retrieved record should only have first 2 elements in sample array
说我希望检索到的记录在样本数组中应该只包含前2个元素
Assuming you have array of strings as a document. 假设您将字符串数组作为文档。 I have a couple of ideas in my mind which might help you.
我有两个想法可能会对您有所帮助。
PUT /arrayindex/
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"spacelyzer": {
"tokenizer": "whitespace"
},
"commalyzer": {
"type": "custom",
"tokenizer": "commatokenizer",
"char_filter": "square_bracket"
}
},
"tokenizer": {
"commatokenizer": {
"type": "pattern",
"pattern": ","
}
},
"char_filter": {
"square_bracket": {
"type": "mapping",
"mappings": [
"[=>",
"]=>"
]
}
}
}
}
},
"mappings": {
"array_set": {
"properties": {
"array_space": {
"analyzer": "spacelyzer",
"type": "string"
},
"array_comma": {
"analyzer": "commalyzer",
"type": "string"
}
}
}
}
}
POST /arrayindex/array_set/1
{
"array_space": "qwer qweee trrww ooenriwu njj"
}
POST /arrayindex/array_set/2
{
"array_comma": "[qwer,qweee,trrww,ooenriwu,njj]"
}
The above DSL accepts two types of arrays one is a white-space separated string where every string represents an element of array and the other is a type of array that was specified by you. 上面的DSL接受两种类型的数组,一种是用空格分隔的字符串,其中每个字符串代表一个数组的元素,另一种是您指定的一种数组。 This is array is possible in Python and in python if you index such a document it is automatically converted to string ie
["abc","xyz","mnp".....]
would be converted to "["abc","xyz","mnp".....]"
. 这是数组,在Python中是可能的,在python中,如果您为这样的文档建立索引,它会自动转换为字符串,即
["abc","xyz","mnp".....]
将转换为"["abc","xyz","mnp".....]"
。
spacelyzer
tokenizes according to the whitespaces and commalyzer
tokenizes according to the commas and removes [ and ]
from the string. spacelyzer
根据空格标记化,而commalyzer
根据逗号标记化, [ and ]
从字符串中删除[ and ]
。
Now if you'll the Termvector API like this: 现在,如果您使用如下的Termvector API:
GET arrayindex/array_set/1/_termvector
{
"fields" : ["array_space", "array_comma"],
"term_statistics" : true,
"field_statistics" : true
}
GET arrayindex/array_set/2/_termvector
{
"fields" : ["array_space", "array_comma"],
"term_statistics" : true,
"field_statistics" : true
}
You can simply get the position of the element from their responses eg to find the position of "njj"
use 您可以简单地从他们的响应中获取元素的位置,例如找到
"njj"
使用的位置
termvector_response["term_vectors"]["array_comma"]["terms"]["njj"]["tokens"][0]["position"]
or, termvector_response["term_vectors"]["array_comma"]["terms"]["njj"]["tokens"][0]["position"]
或,
termvector_response["term_vectors"]["array_space"]["terms"]["njj"]["tokens"][0]["position"]
Both will give you 4
which is the actual index in the array specified. 两者都会给您
4
,它是指定数组中的实际索引。 I suggest you to the whitespace
type design. 我建议您进行
whitespace
设计。
The Python code for this can be: 用于此的Python代码可以是:
from elasticsearch import Elasticsearch
ES_HOST = {"host" : "localhost", "port" : 9200}
ES_CLIENT = Elasticsearch(hosts = [ES_HOST], timeout = 180)
def getTermVector(doc_id):
a = ES_CLIENT.termvector\
(index = "arrayindex",
doc_type = "array_set",
id = doc_id,
field_statistics = True,
fields = ['array_space', 'array_comma'],
term_statistics = True)
return a
def getElements(num, array_no):
all_terms = getTermVector(array_no)['term_vectors']['array_space']['terms']
for i in range(num):
for term in all_terms:
for jsons in all_terms[term]['tokens']:
if jsons['position'] == i:
print term, "@ index", i
getElements(3, 1)
# qwer @ index 0
# qweee @ index 1
# trrww @ index 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.