繁体   English   中英

Elasticsearch Python客户端索引JSON

[英]Elasticsearch Python Client indexing JSON

在玩Elasticsearch Python Client时遇到问题。 我在一个名为test.json的文件中具有(有效!)JSON。 我现在想在elasticsearch中索引该JSON。 我尝试了这个小教程,以检查是否可以连接到本地elasticsearch实例,并且它可以正常工作,因此我认为问题不在于与Elasticsearch的连接。

当我在这里运行我的小代码时:

from elasticsearch import Elasticsearch
import json

es = Elasticsearch([{'host': 'localhost', 'port': 9200}])

with open('test.json') as json_data:
    es.index(index='testdata', doc_type='generated', id=1, body=json.load(json_data))

我在命令行上收到此异常(mapper_parsing_exception?):

    Traceback (most recent call last):
  File "app.py", line 13, in <module>
    es.index(index='testdata', doc_type='generated', id=1, body=json.load(json_data))
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 300, in index
    _make_path(index, doc_type, id), params=params, body=body)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 318, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 128, in perform_request
    self._raise_error(response.status, raw_data)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 124, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, u'mapper_parsing_exception', u'failed to parse')

您能指出我的方向吗,可能是什么问题?

嗯,是的,我打印了完美工作的“ json.load(json_data)”蚂蚁,这意味着从文件加载JSON没问题。

感谢您的帮助! 格蕾兹

更新:

with open('test.json') as json_data:
    #d = json.load(json_data)
    print(json_data)
    es.index(index='testdata', doc_type='generated', id=1, body=json_data)

该代码也不起作用,我什至无法将json打印到CL。

现在出错:

<open file 'test.json', mode 'r' at 0x7f8329340c00>
Traceback (most recent call last):
  File "app.py", line 14, in <module>
    es.index(index='testdata', doc_type='generated', id=1, body=json_data)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 300, in index
    _make_path(index, doc_type, id), params=params, body=body)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 284, in perform_request
    body = self.serializer.dumps(body)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/serializer.py", line 50, in dumps
    raise SerializationError(data, e)
elasticsearch.exceptions.SerializationError: (<closed file 'test.json', mode 'r' at 0x7f8329340c00>, TypeError("Unable to serialize <open file 'test.json', mode 'r' at 0x7f8329340c00> (type: <type 'file'>)",))

那就是test.json文件的内容(只是一些随机生成的json):

[
     {
        "_id": "58ee19e75ffc814d4dff17da",
        "index": 0,
        "guid": "45476739-80b3-49de-8f00-9923f84f56ce",
        "isActive": true,
        "balance": "$2,882.08",
        "picture": "http://placehold.it/32x32",
        "age": 31,
        "eyeColor": "blue",
        "name": "Liliana Odom",
        "gender": "female",
        "company": "PLASTO",
        "email": "lilianaodom@plasto.com",
        "phone": "+1 (983) 474-3785",
        "address": "121 Sedgwick Place, Farmington, Marshall Islands, 2593",
        "about": "Adipisicing veniam ex nulla irure minim incididunt et irure est nostrud ex ut. Occaecat eu proident eu pariatur deserunt aliquip. Commodo ullamco incididunt consequat quis commodo irure elit quis. Aute et reprehenderit ad ipsum magna cupidatat magna minim sunt labore mollit occaecat. Dolore sint veniam deserunt excepteur.",
        "registered": "2015-05-07T05:40:28 -02:00",
        "latitude": -46.141522,
        "longitude": -157.943368,
        "tags": [
          "labore",
          "quis"
        ],
        "friends": [
          {
            "id": 0,
            "name": "Earline Bass"
          }
        ],
        "greeting": "Hello, Liliana Odom! You have 5 unread messages.",
        "favoriteFruit": "apple"
      }
    ]

更新2:

我现在尝试了这个:

id = 1
with open('test.json') as json_data:
    data = json.load(json_data)
    for dat in data:
        print(json.dumps(dat))
        es.index(index='testdata', doc_type='generated', id=id, body=json.dumps(dat))
        id += 1

print(json.dumps(dat))可以工作,但是我现在得到了llegalArgumentException:

Traceback (most recent call last):
  File "app.py", line 15, in <module>
    es.index(index='testdata', doc_type='generated', id=id, body=json.dumps(dat))
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 300, in index
    _make_path(index, doc_type, id), params=params, body=body)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 318, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 128, in perform_request
    self._raise_error(response.status, raw_data)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 124, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, u'illegal_argument_exception', u'[Bloodstorm][127.0.0.1:9300][indices:data/write/index[p]]')

更新3:这是ES日志,好像id字段在此索引中定义了两次。

[2017-04-12 17:43:07,847][DEBUG][action.index             ] [Bloodstorm] failed to execute [index {[testdata][generated][AVti1SY7fn4azWzi8gyQ], source[{"guid": "45476739-80b3-49de-8f00-9923f84f56ce", "index": 0, "favoriteFruit": "apple", "latitude": -46.141522, "company": "PLASTO", "email": "lilianaodom@plasto.com", "picture": "http://placehold.it/32x32", "tags": ["labore", "quis"], "registered": "2015-05-07T05:40:28 -02:00", "eyeColor": "blue", "phone": "+1 (983) 474-3785", "address": "121 Sedgwick Place, Farmington, Marshall Islands, 2593", "friends": [{"id": 0, "name": "Earline Bass"}], "isActive": true, "about": "Adipisicing veniam ex nulla irure minim incididunt et irure est nostrud ex ut. Occaecat eu proident eu pariatur deserunt aliquip. Commodo ullamco incididunt consequat quis commodo irure elit quis. Aute et reprehenderit ad ipsum magna cupidatat magna minim sunt labore mollit occaecat. Dolore sint veniam deserunt excepteur.", "balance": "$2,882.08", "name": "Liliana Odom", "gender": "female", "age": 31, "greeting": "Hello, Liliana Odom! You have 5 unread messages.", "longitude": -157.943368, "_id": "58ee19e75ffc814d4dff17da"}]}] on [[testdata][3]]
java.lang.IllegalArgumentException: Field [_id] is defined twice in [generated]
        at org.elasticsearch.index.mapper.MapperService.checkFieldUniqueness(MapperService.java:496)
        at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:376)
        at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:320)
        at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.applyRequest(MetaDataMappingService.java:306)
        at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.execute(MetaDataMappingService.java:230)
        at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:480)
        at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:784)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

给定test.json文件的结构,您需要对其进行解析,然后遍历数组中的每个文档:

with open('test.json') as raw_data:
    json_docs = json.loads(raw_data)
    for json_doc in json_docs:
        my_id = json_doc.pop('_id', None)
        es.index(index='testdata', doc_type='generated', id=my_id, body=json.dumps(json_doc))

您可以从test.json文件中删除方括号,然后重试。

{
        "_id": "58ee19e75ffc814d4dff17da",
        "index": 0,
        "guid": "45476739-80b3-49de-8f00-9923f84f56ce",
        "isActive": true,
        "balance": "$2,882.08",
        "picture": "http://placehold.it/32x32",
        "age": 31,
        "eyeColor": "blue",
        "name": "Liliana Odom",
        "gender": "female",
        "company": "PLASTO",
        "email": "lilianaodom@plasto.com",
        "phone": "+1 (983) 474-3785",
        "address": "121 Sedgwick Place, Farmington, Marshall Islands, 2593",
        "about": "Adipisicing veniam ex nulla irure minim incididunt et irure est nostrud ex ut. Occaecat eu proident eu pariatur deserunt aliquip. Commodo ullamco incididunt consequat quis commodo irure elit quis. Aute et reprehenderit ad ipsum magna cupidatat magna minim sunt labore mollit occaecat. Dolore sint veniam deserunt excepteur.",
        "registered": "2015-05-07T05:40:28 -02:00",
        "latitude": -46.141522,
        "longitude": -157.943368,
        "tags": [
          "labore",
          "quis"
        ],
        "friends": [
          {
            "id": 0,
            "name": "Earline Bass"
          }
        ],
        "greeting": "Hello, Liliana Odom! You have 5 unread messages.",
        "favoriteFruit": "apple"
      }

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM