Elasticsearch 批量索引 JSON 数据

Question

I am trying to bulk index a JSON file into a new Elasticsearch index and am unable to do so.我正在尝试将 JSON 文件批量索引到新的 Elasticsearch 索引中，但无法这样做。 I have the following sample data inside the JSON我在 JSON 中有以下示例数据

[{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"},
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"},
{"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"},
{"Amount": "2115", "Quantity": "2", "Id": "975463798", "Client_Store_sk": "1109"},
{"Amount": "2116", "Quantity": "1", "Id": "975463827", "Client_Store_sk": "1109"},
{"Amount": "648", "Quantity": "3", "Id": "975464139", "Client_Store_sk": "1109"},
{"Amount": "2126", "Quantity": "2", "Id": "975464805", "Client_Store_sk": "1109"},
{"Amount": "2133", "Quantity": "1", "Id": "975464061", "Client_Store_sk": "1109"},
{"Amount": "1339", "Quantity": "4", "Id": "974919458", "Client_Store_sk": "1109"},
{"Amount": "1196", "Quantity": "5", "Id": "974920538", "Client_Store_sk": "1109"},
{"Amount": "1198", "Quantity": "4", "Id": "975463638", "Client_Store_sk": "1109"},
{"Amount": "1345", "Quantity": "4", "Id": "974919522", "Client_Store_sk": "1109"},
{"Amount": "1347", "Quantity": "2", "Id": "974919563", "Client_Store_sk": "1109"},
{"Amount": "673", "Quantity": "2", "Id": "975464359", "Client_Store_sk": "1109"},
{"Amount": "2153", "Quantity": "1", "Id": "975464511", "Client_Store_sk": "1109"},
{"Amount": "3896", "Quantity": "4", "Id": "977289342", "Client_Store_sk": "1109"},
{"Amount": "3897", "Quantity": "4", "Id": "974920602", "Client_Store_sk": "1109"}]

I am using我在用

 curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary --data @/home/data1.json

When I try to use the standard bulk index API from Elasticsearch I get this error当我尝试使用 Elasticsearch 的标准批量索引 API 时，出现此错误

 error: {"message":"ActionRequestValidationException[Validation Failed: 1: no requests added;]"}

Can anyone help with indexing this type of JSON?任何人都可以帮助索引这种类型的 JSON 吗？

Answer 1

What you need to do is to read that JSON file and then build a bulk request with the format expected by the _bulk endpoint , ie one line for the command and one line for the document, separated by a newline character... rinse and repeat for each document:您需要做的是读取该 JSON 文件，然后使用_bulk端点预期的格式构建批量请求，即一行用于命令，一行用于文档，由换行符分隔...冲洗并重复对于每个文件：

curl -XPOST localhost:9200/your_index/_bulk -d '
{"index": {"_index": "your_index", "_type": "your_type", "_id": "975463711"}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index": {"_index": "your_index", "_type": "your_type", "_id": "975463943"}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
... etc for all your documents
'

Just make sure to replace your_index and your_type with the actual index and type names you're using.只需确保将your_index和your_type替换为您正在使用的实际索引和类型名称。

UPDATE更新

Note that the command-line can be shortened, by removing _index and _type if those are specified in your URL.请注意，如果 URL 中指定了_index和_type则可以通过删除_index和_type来缩短命令行。 It is also possible to remove _id if you specify the path to your id field in your mapping (note that this feature will be deprecated in ES 2.0, though).如果您在映射中指定id 字段的路径，也可以删除_id （但请注意，此功能将在 ES 2.0 中弃用）。 At the very least, your command line can look like {"index":{}} for all documents but it will always be mandatory in order to specify which kind of operation you want to perform (in this case index the document)至少，对于所有文档，您的命令行看起来像{"index":{}}但它始终是强制性的，以便指定您要执行的操作类型（在这种情况下index文档）

UPDATE 2更新 2

curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary  @/home/data1.json

/home/data1.json should look like this: /home/data1.json应如下所示：

{"index":{}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index":{}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
{"index":{}}
{"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"}

UPDATE 3更新 3

You can refer to this answer to see how to generate the new json style file mentioned in UPDATE 2 .您可以参考这个答案，了解如何生成UPDATE 2 中提到的新 json 样式文件。

Answer 2

As of today, 6.1.2 is the latest version of ElasticSearch, and the curl command that works for me on Windows (x64) is截至今天，6.1.2 是 ElasticSearch 的最新版本，适用于我在 Windows (x64) 上的 curl 命令是

curl -s -XPOST localhost:9200/my_index/my_index_type/_bulk -H "Content-Type: 
application/x-ndjson" --data-binary @D:\data\mydata.json

The format of the data that should be present in mydata.json remains the same as shown in @val's answer应存在于 mydata.json 中的数据格式与@val 的回答中显示的相同

Answer 3

A valid Elasticsearch bulk API request would be something like (ending with a newline):有效的Elasticsearch 批量 API请求类似于（以换行符结尾）：

POST http://localhost:9200/products_slo_development_temp_2/productModel/_bulk POST http://localhost:9200/products_slo_development_temp_2/productModel/_bulk

{ "index":{ } } 
{"RequestedCountry":"slo","Id":1860,"Title":"Stol"} 
{ "index":{ } } 
{"RequestedCountry":"slo","Id":1860,"Title":"Miza"}

Elasticsearch bulk api documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html Elasticsearch 批量 API 文档： https : //www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

This is how I do it这就是我的做法

I send a POST http request with the uri valiable as the URI/URL of the http request and elasticsearchJson variable is the JSON sent in the body of the http request formatted for the Elasticsearch bulk api:我发送一个带有uri的 POST http 请求，作为 http 请求的 URI/URL 和elasticsearchJson变量是在为 Elasticsearch 批量 api 格式化的 http 请求正文中发送的 JSON：

var uri = @"/" + indexName + "/productModel/_bulk";
var json = JsonConvert.SerializeObject(sqlResult);
var elasticsearchJson = GetElasticsearchBulkJsonFromJson(json, "RequestedCountry");

Helper method for generating the required json format for the Elasticsearch bulk api:生成 Elasticsearch 批量 api 所需的 json 格式的 Helper 方法：

public string GetElasticsearchBulkJsonFromJson(string jsonStringWithArrayOfObjects, string firstParameterNameOfObjectInJsonStringArrayOfObjects)
{
  return @"{ ""index"":{ } } 
" + jsonStringWithArrayOfObjects.Substring(1, jsonStringWithArrayOfObjects.Length - 2).Replace(@",{""" + firstParameterNameOfObjectInJsonStringArrayOfObjects + @"""", @" 
{ ""index"":{ } } 
{""" + firstParameterNameOfObjectInJsonStringArrayOfObjects + @"""") + @"
";
}

The first property/field in my JSON object is the RequestedCountry property that's why I use it in this example.我的 JSON 对象中的第一个属性/字段是RequestedCountry属性，这就是我在本示例中使用它的原因。

productModel is my Elasticsearch document type. productModel是我的 Elasticsearch 文档类型。 sqlResult is a C# generic list with products. sqlResult是一个包含产品的 C# 通用列表。

Answer 4

This answer is for Elastic Search 7.x onwards.此答案适用于 Elastic Search 7.x 及以上版本。 _type is deprecated. _type已弃用。 As others have mentioned, you can read the file programatically, and construct a request body as described below.正如其他人所提到的，您可以以编程方式读取文件，并如下所述构建请求正文。 Also, I see that each of your json object has the Id attribute.另外，我看到您的每个 json 对象都有Id属性。 So, you could set the document's internal id ( _id ) to be the same as this attribute.因此，您可以将文档的内部 id ( _id ) 设置为与此属性相同。 Updated _bulk API would look like this:更新后的_bulk API 如下所示：

HTTP Method: POST HTTP 方法： POST

URI: /<index_name>/_bulk URI： /<index_name>/_bulk

Request body (should end with a new line):请求正文（应以新行结尾）：

{"index":{"_id": "975463711"}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index":{"_id": "975463943"}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}

Elasticsearch 批量索引 JSON 数据

问题描述

4 个解决方案

解决方案1
86 已采纳 2015-10-26 07:09:38

解决方案2
12 2018-01-29 10:32:37

解决方案3
1 2019-06-17 18:57:30

解决方案4
0 2021-08-16 06:28:24

Elasticsearch 批量索引 JSON 数据

问题描述

4 个解决方案

解决方案1 86 已采纳 2015-10-26 07:09:38

解决方案2 12 2018-01-29 10:32:37

解决方案3 1 2019-06-17 18:57:30

解决方案4 0 2021-08-16 06:28:24

解决方案1
86 已采纳 2015-10-26 07:09:38

解决方案2
12 2018-01-29 10:32:37

解决方案3
1 2019-06-17 18:57:30

解决方案4
0 2021-08-16 06:28:24