[英]Loading irregular json into Elasticsearch index with mapping using Python client
I have some .json where not all fields are present in all records, for eg caseclass.json
looks like:我有一些 .json 文件,其中并非所有字段都出现在所有记录中,例如caseclass.json
看起来像:
[{
"name" : "john smith",
"age" : 12,
"cars": ["ford", "toyota"],
"comment": "i am happy"
},
{
"name": "a. n. other",
"cars": "",
"comment": "i am panicking"
}]
Using Elasticsearch-7.6.1 via python client elasticsearch:通过 python 客户端 elasticsearch 使用 Elasticsearch-7.6.1:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
import json
import os
from elasticsearch_dsl import Document, Text, Date, Integer, analyzer
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
class Person(Document):
class Index:
using = es
name = 'person_index'
name = Text()
age = Integer()
cars = Text()
comment = Text(analyzer='snowball')
Person.init()
with open ("caseclass.json") as json_file:
data = json.load(json_file)
for indexid in range(len(data)):
document = Person(name=data[indexid]['name'], age=data[indexid]['age'], cars=data[indexid]['cars'], comment=data[indexid]['comment'])
document.meta.id = indexid
document.save()
Naturally I get KeyError: 'age'
when the second record is trying to be read.当然,当第二条记录试图被读取时,我得到KeyError: 'age'
。 My question is: it is possible to load such records onto a Elasticsearch index using the Python client and a pre-defined mapping , instead of dynamic mapping?我的问题是:是否可以使用 Python 客户端和预定义映射而不是动态映射将此类记录加载到 Elasticsearch 索引上? Above code works if all fields are present in all records but is there a way to do this without checking presence of each field per record as the actual records have complex structure and there are millions of them?如果所有字段都存在于所有记录中,则上面的代码有效,但是有没有一种方法可以在不检查每个记录的每个字段的情况下执行此操作,因为实际记录具有复杂的结构并且有数百万个? Thanks谢谢
The error has nothing to do w/ your mapping -- it's just telling you that age
could not be accessed in one of your caseclasses
.该错误与您的映射无关 - 它只是告诉您在您的caseclasses
之一中无法访问age
。
The index mapping is created when you call Person.init()
-- you can verify that by calling print(es.indices.get_mapping(Person.Index.name))
right after Person.init()
.索引映射是在您调用Person.init()
时创建的——您可以通过在Person.init()
之后Person.init()
调用print(es.indices.get_mapping(Person.Index.name))
来验证这一点。
I've cleaned up your code a bit:我已经清理了你的代码:
import json
import os
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Document, Text, Date, Integer, analyzer
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
class Person(Document):
class Index:
using = es
name = 'person_index'
name = Text()
age = Integer()
cars = Text()
comment = Text(analyzer='snowball')
Person.init()
print(es.indices.get_mapping(Person.Index.name))
with open("caseclass.json") as json_file:
data = json.load(json_file)
for indexid, case in enumerate(data):
document = Person(**case)
document.meta.id = indexid
document.save()
Notice how I used **case
to spread all key-value pairs inside of a case
instead of using data[property_key]
.请注意我如何使用**case
将所有键值对分布在一个case
而不是使用data[property_key]
。
The generated mapping is as follows:生成的映射如下:
{
"person_index" : {
"mappings" : {
"properties" : {
"age" : {
"type" : "integer"
},
"cars" : {
"type" : "text"
},
"comment" : {
"type" : "text",
"analyzer" : "snowball"
},
"name" : {
"type" : "text"
}
}
}
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.