简体   繁体   English

在Elastic Search中,如何一次批量索引Json文件多值文档?

[英]In Elastic Search, how do you bulk index a Json file multi-value documents all at once?

I've already looked through ES documentation and read related questions, but none of them have worked for me thus far. 我已经阅读了ES文档并阅读了相关问题,但是到目前为止,它们都没有为我工作。

Basically I have a Json file with written with multiple documents in this format: 基本上我有一个Json文件,其中以这种格式写有多个文档:

[ { 
    "account": "Sam420", 
    "language": null, 
    "watchers": 0, 
    "commits": 14, 
    "contributors": 2, 
    "stars": 0, 
    "rank": 16, 
}
{ 
    "account": "Kelly", 
    "language": null, 
    "watchers": 0, 
    "commits": 14, 
    "contributors": 2, 
    "stars": 0, 
    "rank": 16, 
} ] 

I have tried post request using the bulk API to my local ES setup with this following the following body format: 我已尝试使用批量API将发布请求发送到我的本地ES设置,并采用以下主体格式:

 { "index": {} }
 { 
    "account": "Kelly", 
    "language": null, 
    "watchers": 0, 
    "commits": 14, 
    "contributors": 2, 
    "stars": 0, 
    "rank": 16, 
} 
{ "index": {} }
{ 
    "account": "Kelly", 
    "language": null, 
    "watchers": 0, 
    "commits": 14, 
    "contributors": 2, 
    "stars": 0, 
    "rank": 16, 
} 

But, I'm getting a parser error. 但是,我收到解析器错误。 It does work when I rearrange data into one single row per data like this: 当我将每个数据重新排列成一行时,它确实起作用,如下所示:

{ "index": { "_index": "folder" } }
{ "account": "Sam420", "language": null, ... }
{ "index": { "_index": "Canigan"} }
{ "account": "Kelly", "language": null, ... } 

Here's the parser error: 这是解析器错误:

{
    "error": {
    "root_cause": [
       {
          "type": "json_parse_exception",
          "reason": "Unexpected character (':' (code 58)): expected a      
                    valid value (number, String, array, object, 'true',     
                    'false' or 'null')\n at [Source: [B@6bd0ddf7; line: 
                    1, column: 10]"
        }],
           "type": "json_parse_exception",
           "reason": "Unexpected character (':' (code 58)): expected a 
                     valid value (number, String, array, object 'true', 
                    'false' or 'null')\n at [Source: [B@6bd0ddf7; line: 
                    1, column: 10]"
       },
       "status": 500
}

But, I'm pulling repo data with 100+ documents from Github API, and each value is arranged vertically. 但是,我要从Github API提取100多个文档的回购数据,并且每个值都是垂直排列的。 Without having to reformat it using script, what can I do to bulk index multiple documents in the Json format that's already given to me? 不必使用脚本对其重新格式化,我该怎么做才能以已经提供给我的Json格式对多个文档进行批量索引? If not, is there any other way besides bulk index I can use to index multiple documents at once? 如果不是,除了批量索引,我还可以使用其他方式同时索引多个文档吗?

The documentation for version 5.5 makes this pretty clear: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html 5.5版的文档对此非常清楚: https : //www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

Because this format uses literal \\n's as delimiters, please be sure that the JSON actions and sources are not pretty printed. 由于此格式使用文字\\ n作为定界符,因此请确保JSON操作和源没有被漂亮地打印出来。

You must lay out the objects as single lines. 必须将对象布置为单行。

That being said, you don't really need a complicated script to reformat your objects. 话虽如此,您实际上并不需要复杂的脚本来重新格式化对象。 You could use something like Notepad++ to replace ",\\n" (comma then newline) with ", " (comma then space). 您可以使用Notepad ++之类的东西将“,\\ n”(逗号然后换行)替换为“,”(逗号然后空格)。 Then interleave your index/metadata lines like you're doing. 然后像您所做的那样插入索引/元数据行。

You might also need to watch out for the trailing comma at the end of your list of properties. 您可能还需要注意属性列表末尾的逗号。

I think it doesn't work because you have no information about the index and the index type. 我认为这行不通,因为您没有有关索引和索引类型的信息。

{ "index": {"my_index", "my_index_type"} }
{ "account": "Sam420", "language": null, ... }
{ "index": {"my_index", "my_index_type"} }
{ "account": "Kelly", "language": null, ... }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM