批量导入数据到ElasticSearch

Question

I have elasticsearch data in JSON that i wanna upload at once via curl我在 JSON 中有 elasticsearch 数据，我想通过 curl 立即上传

curl -s -H "Content-Type: application/json" -XPOST localhost:9200/_bulk --data-binary @C:\Users\adm\Desktop\test.json

but I get this error:但我收到此错误：

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Action/metadata line [1] contains an unknown parameter [_score]"}],"type":"illegal_argument_exception","reason":"Action/metadata line [1] contains an unknown parameter [_score]"},"status":400}

and the data (test.json) looks like this:数据 (test.json) 如下所示：

{"index" :{"_index":"variationdetails","_type":"_doc","_id":"e17bd50b-fe65-423c-a9f8-4d45ecf56559","_score":1,"_source":{"entityname":"Cislo_f","keyFieldObject":null,"keynotcolumn":false,"keyphrase":"daňový doklad č.","keypositionbottom":301,"keypositionleft":1482,"keypositionright":2000,"keypositiontop":251,"keytovaluedeltaleft":551,"keytovaluedeltatop":7,"userchanged":true,"valuepositionbottom":306,"valuepositionleft":2033,"valuepositionright":2387,"valuepositiontop":258,"variationguid":"a20e3d7a-bf38-4eae-9f23-fb100b539d08","vddid":"e17bd50b-fe65-423c-a9f8-4d45ecf56559"}}}
{"index" :{"_index":"variationdetails","_type":"_doc","_id":"c2a831f1-8156-434c-bd84-08db64c935a5","_score":1,"_source":{"entityname":"Datum_splatnosti","keyFieldObject":null,"keynotcolumn":false,"keyphrase":"Datum splatnosti:","keypositionbottom":1154,"keypositionleft":1706,"keypositionright":2015,"keypositiontop":1112,"keytovaluedeltaleft":421,"keytovaluedeltatop":11,"userchanged":true,"valuepositionbottom":1149,"valuepositionleft":2127,"valuepositionright":2298,"valuepositiontop":1123,"variationguid":"a20e3d7a-bf38-4eae-9f23-fb100b539d08","vddid":"c2a831f1-8156-434c-bd84-08db64c935a5"}}}

I tried changing _bulk to variationdetails/_doc but that didnt help.我尝试将 _bulk 更改为 variationdetails/_doc 但这没有帮助。 I cant use elasticdump on the target system (no inte.net or copy files option)我不能在目标系统上使用 elasticdump（没有 inte.net 或复制文件选项）

Answer 1

The documentation for the bulk insert API gives an example and description of the required input. 批量插入 API 的文档给出了所需输入的示例和描述。

For each record you want to create or update, you need two lines of JSON:对于您要创建或更新的每条记录，您需要两行 JSON：

The first line specifies the action to take , and the document to take it on .第一行指定要采取的行动，以及采取行动的文件。 Essentially, the details which would be in the URL and HTTP request method on a single-item action.本质上，在单项操作的 URL 和 HTTP 请求方法中的详细信息。
The second line specifies the data to use.第二行指定要使用的数据。 Essentially, the details which would be in the body of a single-item action.本质上，细节将包含在单项操作的主体中。

So for your example, it would look like this:因此，对于您的示例，它看起来像这样：

{"index" :{"_index":"variationdetails","_id":"e17bd50b-fe65-423c-a9f8-4d45ecf56559"}}
{"_type":"_doc","_score":1,"_source":{"entityname":"Cislo_f","keyFieldObject":null,"keynotcolumn":false,"keyphrase":"daňový doklad č.","keypositionbottom":301,"keypositionleft":1482,"keypositionright":2000,"keypositiontop":251,"keytovaluedeltaleft":551,"keytovaluedeltatop":7,"userchanged":true,"valuepositionbottom":306,"valuepositionleft":2033,"valuepositionright":2387,"valuepositiontop":258,"variationguid":"a20e3d7a-bf38-4eae-9f23-fb100b539d08","vddid":"e17bd50b-fe65-423c-a9f8-4d45ecf56559"}}
{"index" :{"_index":"variationdetails","_id":"c2a831f1-8156-434c-bd84-08db64c935a5"}}
{"_type":"_doc","_score":1,"_source":{"entityname":"Datum_splatnosti","keyFieldObject":null,"keynotcolumn":false,"keyphrase":"Datum splatnosti:","keypositionbottom":1154,"keypositionleft":1706,"keypositionright":2015,"keypositiontop":1112,"keytovaluedeltaleft":421,"keytovaluedeltatop":11,"userchanged":true,"valuepositionbottom":1149,"valuepositionleft":2127,"valuepositionright":2298,"valuepositiontop":1123,"variationguid":"a20e3d7a-bf38-4eae-9f23-fb100b539d08","vddid":"c2a831f1-8156-434c-bd84-08db64c935a5"}}

I'm not sure if _source is supposed to be part of the document or not;我不确定_source是否应该成为文档的一部分； if not, you probably want this:如果没有，你可能想要这个：

{"index" :{"_index":"variationdetails","_id":"e17bd50b-fe65-423c-a9f8-4d45ecf56559"}}
{"_type":"_doc","_score":1,"entityname":"Cislo_f","keyFieldObject":null,"keynotcolumn":false,"keyphrase":"daňový doklad č.","keypositionbottom":301,"keypositionleft":1482,"keypositionright":2000,"keypositiontop":251,"keytovaluedeltaleft":551,"keytovaluedeltatop":7,"userchanged":true,"valuepositionbottom":306,"valuepositionleft":2033,"valuepositionright":2387,"valuepositiontop":258,"variationguid":"a20e3d7a-bf38-4eae-9f23-fb100b539d08","vddid":"e17bd50b-fe65-423c-a9f8-4d45ecf56559"}
{"index" :{"_index":"variationdetails","_id":"c2a831f1-8156-434c-bd84-08db64c935a5"}}
{"_type":"_doc","_score":1,"entityname":"Datum_splatnosti","keyFieldObject":null,"keynotcolumn":false,"keyphrase":"Datum splatnosti:","keypositionbottom":1154,"keypositionleft":1706,"keypositionright":2015,"keypositiontop":1112,"keytovaluedeltaleft":421,"keytovaluedeltatop":11,"userchanged":true,"valuepositionbottom":1149,"valuepositionleft":2127,"valuepositionright":2298,"valuepositiontop":1123,"variationguid":"a20e3d7a-bf38-4eae-9f23-fb100b539d08","vddid":"c2a831f1-8156-434c-bd84-08db64c935a5"}

Answer 2

If you want to transfer data from cluster to another then best option is to use Snapshot and Restore API of elasticsearch.如果您想将数据从集群传输到另一个集群，那么最好的选择是使用 elasticsearch 的快照和还原API。

If you want to use _bulk API then you need to follow bulk api format and your json format should be in below format only.如果你想使用_bulk API那么你需要遵循 bulk api 格式，你的 json 格式只能是下面的格式。 You can create your json file in ndjson format for bulk api.您可以为批量 api 创建ndjson格式的 json 文件。

action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n

{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

You are getting error for _score because it is internal field / variable of elasticsearch and it show value of relevancy score based on your query.您收到_score的内部字段/变量，并且它根据您的查询显示相关性分数的值。

批量导入数据到ElasticSearch

问题描述

2 个解决方案

解决方案1
1 2022-03-11 12:35:16

解决方案2
0 2022-03-11 12:41:05

批量导入数据到ElasticSearch

问题描述

2 个解决方案

解决方案1 1 2022-03-11 12:35:16

解决方案2 0 2022-03-11 12:41:05

解决方案1
1 2022-03-11 12:35:16

解决方案2
0 2022-03-11 12:41:05