简体   繁体   English

Elasticsearch 批量索引 JSON 数据

[英]Elasticsearch Bulk Index JSON Data

I am trying to bulk index a JSON file into a new Elasticsearch index and am unable to do so.我正在尝试将 JSON 文件批量索引到新的 Elasticsearch 索引中,但无法这样做。 I have the following sample data inside the JSON我在 JSON 中有以下示例数据

[{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"},
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"},
{"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"},
{"Amount": "2115", "Quantity": "2", "Id": "975463798", "Client_Store_sk": "1109"},
{"Amount": "2116", "Quantity": "1", "Id": "975463827", "Client_Store_sk": "1109"},
{"Amount": "648", "Quantity": "3", "Id": "975464139", "Client_Store_sk": "1109"},
{"Amount": "2126", "Quantity": "2", "Id": "975464805", "Client_Store_sk": "1109"},
{"Amount": "2133", "Quantity": "1", "Id": "975464061", "Client_Store_sk": "1109"},
{"Amount": "1339", "Quantity": "4", "Id": "974919458", "Client_Store_sk": "1109"},
{"Amount": "1196", "Quantity": "5", "Id": "974920538", "Client_Store_sk": "1109"},
{"Amount": "1198", "Quantity": "4", "Id": "975463638", "Client_Store_sk": "1109"},
{"Amount": "1345", "Quantity": "4", "Id": "974919522", "Client_Store_sk": "1109"},
{"Amount": "1347", "Quantity": "2", "Id": "974919563", "Client_Store_sk": "1109"},
{"Amount": "673", "Quantity": "2", "Id": "975464359", "Client_Store_sk": "1109"},
{"Amount": "2153", "Quantity": "1", "Id": "975464511", "Client_Store_sk": "1109"},
{"Amount": "3896", "Quantity": "4", "Id": "977289342", "Client_Store_sk": "1109"},
{"Amount": "3897", "Quantity": "4", "Id": "974920602", "Client_Store_sk": "1109"}]

I am using我在用

 curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary --data @/home/data1.json 

When I try to use the standard bulk index API from Elasticsearch I get this error当我尝试使用 Elasticsearch 的标准批量索引 API 时,出现此错误

 error: {"message":"ActionRequestValidationException[Validation Failed: 1: no requests added;]"}

Can anyone help with indexing this type of JSON?任何人都可以帮助索引这种类型的 JSON 吗?

What you need to do is to read that JSON file and then build a bulk request with the format expected by the _bulk endpoint , ie one line for the command and one line for the document, separated by a newline character... rinse and repeat for each document:您需要做的是读取该 JSON 文件,然后使用_bulk端点预期的格式构建批量请求,即一行用于命令,一行用于文档,由换行符分隔...冲洗并重复对于每个文件:

curl -XPOST localhost:9200/your_index/_bulk -d '
{"index": {"_index": "your_index", "_type": "your_type", "_id": "975463711"}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index": {"_index": "your_index", "_type": "your_type", "_id": "975463943"}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
... etc for all your documents
'

Just make sure to replace your_index and your_type with the actual index and type names you're using.只需确保将your_indexyour_type替换为您正在使用的实际索引和类型名称。

UPDATE更新

Note that the command-line can be shortened, by removing _index and _type if those are specified in your URL.请注意,如果 URL 中指定了_index_type则可以通过删除_index_type来缩短命令行。 It is also possible to remove _id if you specify the path to your id field in your mapping (note that this feature will be deprecated in ES 2.0, though).如果您在映射中指定id 字段路径,也可以删除_id (但请注意,此功能将在 ES 2.0 中弃用)。 At the very least, your command line can look like {"index":{}} for all documents but it will always be mandatory in order to specify which kind of operation you want to perform (in this case index the document)至少,对于所有文档,您的命令行看起来像{"index":{}}但它始终是强制性的,以便指定您要执行的操作类型(在这种情况下index文档)

UPDATE 2更新 2

curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary  @/home/data1.json

/home/data1.json should look like this: /home/data1.json应如下所示:

{"index":{}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index":{}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
{"index":{}}
{"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"}

UPDATE 3更新 3

You can refer to this answer to see how to generate the new json style file mentioned in UPDATE 2 .您可以参考这个答案,了解如何生成UPDATE 2 中提到的新 json 样式文件。

As of today, 6.1.2 is the latest version of ElasticSearch, and the curl command that works for me on Windows (x64) is截至今天,6.1.2 是 ElasticSearch 的最新版本,适用于我在 Windows (x64) 上的 curl 命令是

curl -s -XPOST localhost:9200/my_index/my_index_type/_bulk -H "Content-Type: 
application/x-ndjson" --data-binary @D:\data\mydata.json

The format of the data that should be present in mydata.json remains the same as shown in @val's answer应存在于 mydata.json 中的数据格式与@val 的回答中显示的相同

A valid Elasticsearch bulk API request would be something like (ending with a newline):有效的Elasticsearch 批量 API请求类似于(以换行符结尾):

POST http://localhost:9200/products_slo_development_temp_2/productModel/_bulk POST http://localhost:9200/products_slo_development_temp_2/productModel/_bulk

{ "index":{ } } 
{"RequestedCountry":"slo","Id":1860,"Title":"Stol"} 
{ "index":{ } } 
{"RequestedCountry":"slo","Id":1860,"Title":"Miza"} 

Elasticsearch bulk api documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html Elasticsearch 批量 API 文档: https : //www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

This is how I do it这就是我的做法

I send a POST http request with the uri valiable as the URI/URL of the http request and elasticsearchJson variable is the JSON sent in the body of the http request formatted for the Elasticsearch bulk api:我发送一个带有uri的 POST http 请求,作为 http 请求的 URI/URL 和elasticsearchJson变量是在为 Elasticsearch 批量 api 格式化的 http 请求正文中发送的 JSON:

var uri = @"/" + indexName + "/productModel/_bulk";
var json = JsonConvert.SerializeObject(sqlResult);
var elasticsearchJson = GetElasticsearchBulkJsonFromJson(json, "RequestedCountry");

Helper method for generating the required json format for the Elasticsearch bulk api:生成 Elasticsearch 批量 api 所需的 json 格式的 Helper 方法:

public string GetElasticsearchBulkJsonFromJson(string jsonStringWithArrayOfObjects, string firstParameterNameOfObjectInJsonStringArrayOfObjects)
{
  return @"{ ""index"":{ } } 
" + jsonStringWithArrayOfObjects.Substring(1, jsonStringWithArrayOfObjects.Length - 2).Replace(@",{""" + firstParameterNameOfObjectInJsonStringArrayOfObjects + @"""", @" 
{ ""index"":{ } } 
{""" + firstParameterNameOfObjectInJsonStringArrayOfObjects + @"""") + @"
";
}

The first property/field in my JSON object is the RequestedCountry property that's why I use it in this example.我的 JSON 对象中的第一个属性/字段是RequestedCountry属性,这就是我在本示例中使用它的原因。

productModel is my Elasticsearch document type. productModel是我的 Elasticsearch 文档类型。 sqlResult is a C# generic list with products. sqlResult是一个包含产品的 C# 通用列表。

This answer is for Elastic Search 7.x onwards.此答案适用于 Elastic Search 7.x 及以上版本。 _type is deprecated. _type已弃用。 As others have mentioned, you can read the file programatically, and construct a request body as described below.正如其他人所提到的,您可以以编程方式读取文件,并如下所述构建请求正文。 Also, I see that each of your json object has the Id attribute.另外,我看到您的每个 json 对象都有Id属性。 So, you could set the document's internal id ( _id ) to be the same as this attribute.因此,您可以将文档的内部 id ( _id ) 设置为与此属性相同。 Updated _bulk API would look like this:更新后的_bulk API 如下所示:

HTTP Method: POST HTTP 方法: POST

URI: /<index_name>/_bulk URI: /<index_name>/_bulk

Request body (should end with a new line):请求正文(应以新行结尾):

{"index":{"_id": "975463711"}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index":{"_id": "975463943"}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM