简体   繁体   English

将 JSON 文件导入/索引到 Elasticsearch

[英]Import/Index a JSON file into Elasticsearch

I am new to Elasticsearch and have been entering data manually up until this point.我是 Elasticsearch 的新手,到目前为止一直在手动输入数据。 For example I've done something like this:例如我做了这样的事情:

$ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elastic Search"
}'

I now have a .json file and I want to index this into Elasticsearch.我现在有一个 .json 文件,我想将它索引到 Elasticsearch 中。 I've tried something like this too, but no success:我也尝试过这样的事情,但没有成功:

curl -XPOST 'http://jfblouvmlxecs01:9200/test/test/1' -d lane.json

How do I import a .json file?如何导入 .json 文件? Are there steps I need to take first to ensure the mapping is correct?我需要先采取哪些步骤来确保映射正确吗?

The right command if you want to use a file with curl is this:如果要使用带有 curl 的文件,正确的命令是:

curl -XPOST 'http://jfblouvmlxecs01:9200/test/_doc/1' -d @lane.json

Elasticsearch is schemaless, therefore you don't necessarily need a mapping. Elasticsearch 是无模式的,因此您不一定需要映射。 If you send the json as it is and you use the default mapping, every field will be indexed and analyzed using the standard analyzer .如果您按原样发送 json 并使用默认映射,则每个字段都将使用标准分析器进行索引和分析。

If you want to interact with Elasticsearch through the command line, you may want to have a look at the elasticshell which should be a little bit handier than curl.如果你想通过命令行与 Elasticsearch 交互,你可能想看看elasticshell ,它应该比 curl 更方便一些。

2019-07-10: Should be noted that custom mapping types is deprecated and should not be used. 2019-07-10:需要注意的是,自定义映射类型已被弃用,不应使用。 I updated the type in the url above to make it easier to see which was the index and which was the type as having both named "test" was confusing.我更新了上面 url 中的类型,以便更容易地看到哪个是索引,哪个是类型,因为同时命名为“test”是令人困惑的。

Per the current docs, https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html :根据当前文档, https : //www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

If you're providing text file input to curl, you must use the --data-binary flag instead of plain -d.如果您向 curl 提供文本文件输入,则必须使用 --data-binary 标志而不是普通的 -d。 The latter doesn't preserve newlines.后者不保留换行符。

Example:例子:

$ curl -s -XPOST localhost:9200/_bulk --data-binary @requests

我们为这类事情做了一个小工具https://github.com/taskrabbit/elasticsearch-dump

One thing I've not seen anyone mention: the JSON file must have one line specifying the index the next line belongs to, for every line of the "pure" JSON file.我没有看到任何人提到的一件事:对于“纯”JSON 文件的每一行,JSON 文件必须有一行指定下一行所属的索引。

IE IE

{"index":{"_index":"shakespeare","_type":"act","_id":0}}
{"line_id":1,"play_name":"Henry IV","speech_number":"","line_number":"","speaker":"","text_entry":"ACT I"}

Without that, nothing works, and it won't tell you why没有它,什么都不起作用,它也不会告诉你为什么

I'm the author of elasticsearch_loader我是 elasticsearch_loader 的作者
I wrote ESL for this exact problem.我为这个确切的问题写了 ESL。

You can download it with pip:你可以用pip下载它:

pip install elasticsearch-loader

And then you will be able to load json files into elasticsearch by issuing:然后您将能够通过发出以下命令将 json 文件加载到 elasticsearch 中:

elasticsearch_loader --index incidents --type incident json file1.json file2.json

I just made sure that I am in the same directory as the json file and then simply ran this我只是确保我与 json 文件在同一目录中,然后简单地运行它

curl -s -H "Content-Type: application/json" -XPOST localhost:9200/product/default/_bulk?pretty --data-binary @product.json

So if you too make sure you are at the same directory and run it this way.所以如果你也确保你在同一个目录下并以这种方式运行它。 Note: product/default/ in the command is something specific to my environment.注意:命令中的 product/default/ 特定于我的环境。 you can omit it or replace it with whatever is relevant to you.您可以省略它或用与您相关的任何内容替换它。

Adding to KenH's answer添加到 KenH 的答案

$ curl -s -XPOST localhost:9200/_bulk --data-binary @requests

You can replace @requests with @complete_path_to_json_file您可以用@requests替换@complete_path_to_json_file

Note: @ is important before the file path注意: @在文件路径之前很重要

just get postman from https://www.getpostman.com/docs/environments give it the file location with /test/test/1/_bulk?pretty command.只需从https://www.getpostman.com/docs/environments获取邮递员,使用 /test/test/1/_bulk?pretty 命令为其指定文件位置。 在此处输入图片说明

You are using您正在使用

$ curl -s -XPOST localhost:9200/_bulk --data-binary @requests

If 'requests' is a json file then you have to change this to如果“请求”是一个 json 文件,那么您必须将其更改为

$ curl -s -XPOST localhost:9200/_bulk --data-binary @requests.json

Now before this, if your json file is not indexed, you have to insert an index line before each line inside the json file.在此之前,如果您的 json 文件未编入索引,则必须在 json 文件中的每一行之前插入一个索引行。 You can do this with JQ.你可以用 JQ 做到这一点。 Refer below link: http://kevinmarsh.com/2014/10/23/using-jq-to-import-json-into-elasticsearch.html请参阅以下链接: http : //kevinmarsh.com/2014/10/23/using-jq-to-import-json-into-elasticsearch.html

Go to elasticsearch tutorials (example the shakespeare tutorial) and download the json file sample used and have a look at it.转到 elasticsearch 教程(例如莎士比亚教程)并下载使用的 json 文件示例并查看它。 In front of each json object (each individual line) there is an index line.在每个 json 对象(每个单独的行)前面都有一个索引行。 This is what you are looking for after using the jq command.这就是您在使用 jq 命令后要查找的内容。 This format is mandatory to use the bulk API, plain json files wont work.这种格式是使用批量 API 所必需的,普通的 json 文件不起作用。

从 Elasticsearch 7.7 开始,您还必须指定内容类型:

curl -s -H "Content-Type: application/json" -XPOST localhost:9200/_bulk --data-binary @<absolute path to JSON file>

I wrote some code to expose the Elasticsearch API via a Filesystem API.我编写了一些代码来通过文件系统 API 公开 Elasticsearch API。

It is good idea for clear export/import of data for example.例如,清楚地导出/导入数据是个好主意。

I created prototype elasticdriver .我创建了原型elasticdriver It is based on FUSE它基于FUSE

演示

  • If you are using the elastic search 7.7 or above version then follow below command.如果您使用的是弹性搜索 7.7 或更高版本,请按照以下命令操作。

    curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk? pretty&refresh" --data-binary @"/Users/waseem.khan/waseem/elastic/account.json"

  • On above file path is /Users/waseem.khan/waseem/elastic/account.json .上面的文件路径是/Users/waseem.khan/waseem/elastic/account.json

  • If you are using elastic search 6.x version then you can use the below command.如果您使用的是弹性搜索 6.x 版本,那么您可以使用以下命令。

curl -X POST localhost:9200/bank/_bulk?pretty&refresh --data-binary @"/Users/waseem.khan/waseem/elastic/account.json" -H 'Content-Type: application/json'

Note : Make sure in your .json file at the end you will add the one empty line otherwise you will be getting below exception.注意:确保在你的.json文件最后你会添加一个空行,否则你会得到以下异常。

"error" : {
"root_cause" : [
  {
    "type" : "illegal_argument_exception",
    "reason" : "The bulk request must be terminated by a newline [\n]"
  }
],
"type" : "illegal_argument_exception",
"reason" : "The bulk request must be terminated by a newline [\n]"
},
`enter code here`"status" : 400

if you are using VirtualBox and UBUNTU in it or you are simply using UBUNTU then it can be useful如果您在其中使用 VirtualBox 和 UBUNTU,或者您只是在使用 UBUNTU,那么它会很有用

wget https://github.com/andrewvc/ee-datasets/archive/master.zip
sudo apt-get install unzip (only if unzip module is not installed)
unzip master.zip
cd ee-datasets
java -jar elastic-loader.jar http://localhost:9200 datasets/movie_db.eloader

If you want to import a json file into Elasticsearch and create an index, use this Python script.如果要将 json 文件导入 Elasticsearch 并创建索引,请使用此 Python 脚本。

import json
from elasticsearch import Elasticsearch

es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
i = 0
with open('el_dharan.json') as raw_data:
    json_docs = json.load(raw_data)
    for json_doc in json_docs:
            i = i + 1
            es.index(index='ind_dharan', doc_type='doc_dharan', id=i, body=json.dumps(json_doc))

Thank you great work !谢谢你的出色工作! SO many useful tips...这么多有用的提示...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM