简体   繁体   English

ElasticSearch JSON文件导入(批量API)

[英]ElasticSearch JSON file import (Bulk API)

I saw a few similar posts to this here on StackOverflow, but I still don't have a clear understanding of how to index a large file with JSON documents into ElasticSearch; 我在StackOverflow上看到了一些与此类似的帖子,但是对于如何将带有JSON文档的大文件索引到ElasticSearch中,我仍然不清楚。 I'm getting errors like the following: 我收到如下错误:

{"error":"ActionRequestValidationException[Validation Failed: 1: index is missing;2: type is missing;]","status":400}

{"took":231,"errors":false,"items":[{"index":{"_index":"test","_type":"type1","_id":"1","_version":7,"status":200}}]

I have a JSON file that is about 2Gb in size, which is the file I actually want to import. 我有一个大约2Gb的JSON文件,这是我实际上要导入的文件。 But first, in order to understand how the Bulk API works, I created a small file with just a single line of actual data: 但是首先,为了理解Bulk API的工作原理,我创建了一个只有一行实际数据的小文件:

testfile.json testfile.json

{"index":{"_id":"someId"}} \n
{"id":"testing"}\n

I got this from another post on SO. 我是从SO上的另一篇文章中得到的。 I understand that the first line is a header, and I also understand that the "index" in the first line is the command which is going to be sent to ES; 我知道第一行是标头,并且我也知道第一行中的“索引”是要发送到ES的命令; however, this still does not work. 但是,这仍然行不通。 Can someone please give me a working example and clear explanation of how to import a JSON file into ES? 有人可以给我一个有效的例子,并明确说明如何将JSON文件导入ES吗?

Thank you! 谢谢!

The following samples comes from the elasticsearch documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html?q=bulk 以下示例来自elasticsearch文档: https ://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html ? q = bulk

{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1"} }
{ "doc" : {"field2" : "value2"} }

So line one tells elastic to index the document on line two into index test, type type1 with _id 1. It will index the document with field1. 因此,第一行告诉Elastic将第二行上的文档索引到索引测试中,使用_id 1输入type1。它将使用field1索引该文档。 You could change the url if they all go to the same index and type. 如果它们都转到相同的索引和类型,则可以更改URL。 Check the link for samples. 检查链接以获取样本。

In line three you see an example of a delete action, this document does not need a document in line four. 在第三行中,您将看到一个删除操作的示例,此文档在第四行中不需要文档。

Be careful with very large documents, 2 Gb is probably to big. 请注意非常大的文档,因为2 Gb可能很大。 It needs to be send to elastic first, which loads it into memory. 需要先将其发送到弹性,然后再将其加载到内存中。 So there is a limit to the amount of records to send. 因此,发送记录的数量受到限制。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM