简体   繁体   English

使用logstash和jdbc更新复杂的嵌套弹性搜索文档

[英]Updating complex nested elasticsearch document using logstash and jdbc

Let's assume that the Oracle Schema has following tables and columns: 我们假设Oracle Schema具有以下表和列:

Country
        country_id; (Primary Key)
        country_name;

    Department
        department_id; (Primary Key)
        department_name;
        country_id; (Foreign key to Country:country_id)

    Employee
        employee_id; (Primary Key)
        employee_name;
        department_id; (Foreign key to Department:department_id)

And I have my Elasticsearch document where the root element is a Country and it contains all Departments in that Country which in turn contain all Employees in respective Departments. 我有我的Elasticsearch文档,其中根元素是国家/地区,它包含该国家/地区中的所有部门,而这些部门又包含相应部门中的所有员工。

So the document structure looks like this: 所以文档结构如下所示:

{
      "mappings": {
        "country": {
          "properties": {
            "country_id": { "type": "string"},
            "country_name": { "type": "string"},        
            "department": {
              "type": "nested",
              "properties": {
                "department_id": { "type": "string"},
                "department_name": { "type": "string"},
                "employee": {
                  "type": "nested",
                  "properties": {
                    "employee_id": { "type": "string"},
                    "employee_name": { "type": "string"}
                  }
                }
              }
            }
          }
        }
      }
    }

I want to be able to have separate input jdbc queries running on each table and they should create/update/delete data in the elasticsearch document whenever the data in the base table are added/updated/deleted. 我希望能够在每个表上运行单独的输入jdbc查询,并且只要添加/更新/删除基表中的数据,它们就应该在elasticsearch文档中创建/更新/删除数据。

This is an example problem and actual tables and data structure are more complex. 这是一个示例问题,实际的表和数据结构更复杂。 So I am not looking for solution limited to this. 所以我不是在寻找限于此的解决方案。

Is there a way to achieve this? 有没有办法实现这个目标?

Thanks. 谢谢。

For level one, its straight forward using aggregate filter . 对于第一级,它使用聚合过滤器直接进行。 You need to have a common id between them to reference. 你需要在它们之间有一个共同的id来引用。

filter {    

  aggregate {
    task_id => "%{id}"

    code => "     
      map['id'] = event.get('id')
      map['department'] ||= []
      map['department'] << event.to_hash.each do |key,value| { key => value } end    
    "
    push_previous_map_as_event => true
    timeout => 150000
    timeout_tags => ['aggregated']    
  } 

   if "aggregated" not in [tags] {
    drop {}
  }
}

Important : The output action should be update 重要提示:输出操作应该更新

    output {
        elasticsearch {
            action => "update"
             ...
           }
        }

One way to solve level 2 is to query the already indexed document and update it with the nested record . 解决级别2的一种方法是查询已编制索引的文档并使用嵌套记录更新它 Again using aggregate filter ; 再次使用聚合过滤器 ; there should be a common id for the document so you can lookup and insert into the correct document. 文档应该有一个公共ID,以便您可以查找并插入到正确的文档中。

filter {    
    #get the document from elastic based on id and store it in 'emp'
    elasticsearch {
            hosts => ["${ELASTICSEARCH_HOST}/${INDEX_NAME}/${INDEX_TYPE}"]
            query => "id:%{id}" 
            fields => { "employee" => "emp" }
         }



  aggregate {
    task_id => "%{id}"  
    code => "       
                map['id'] = event.get('id')
                map['employee'] = []
                employeeArr = []
                temp_emp = {}   

                event.to_hash.each do |key,value|                       
                    temp_emp[key] = value
                end     

                #push the objects into an array
                employeeArr.push(temp_emp)

                empArr = event.get('emp')                   

                for emp in empArr
                    emp['employee'] = employeeArr                       
                    map['employee'].push(emp)
                end
    "
    push_previous_map_as_event => true
    timeout => 150000
    timeout_tags => ['aggregated']

  } 

   if "aggregated" not in [tags] {
    drop {}
  } 

}

output {

elasticsearch {
        action => "update"    #important
         ...
        }
 }  

Also, in order to debug the ruby code, use the below in the output 另外,为了调试ruby代码,请在输出中使用以下内容

output{
    stdout { codec => dots }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 ElasticSearch 7 未从 Logstash 上的 JDBC 获取 _id - ElasticSearch 7 not taking _id from JDBC on Logstash 使用 Logstash JDBC 插件同步 MongoDB 和 Elasticsearch 时避免重复 - Avoid duplicates while syncing MongoDB and Elasticsearch with Logstash JDBC plugin Elasticsearch logstash jdbc:日期时间列被解析为文本类型 - Elasticsearch logstash jdbc : datetime column is parsed as text type 使用UPSERT函数的jdbc输出插件logstash - jdbc output plugin logstash using UPSERT function Logstash JDBC last_run 元数据文件未更新 - Logstash JDBC last_run metadata file not updating Bundler :: GemNotFound:在任何来源9(logstash,elasticsearch)中都找不到logstash-input-jdbc-4.2.1 - Bundler::GemNotFound: Could not find logstash-input-jdbc-4.2.1 in any of the sources 9 (logstash,elasticsearch) 如何使用logstash将elasticsearch与postgres连接起来? - how to connect elasticsearch with postgres using logstash? Logstash Elasticsearch Output 中 document_id 的正确语法 - Correct syntax for document_id in Logstash Elasticsearch Output 使用jdbc在运行时更新记录 - updating a record at runtime using jdbc Logstash Jdbc插件在elasticsearch中填充的数据比实际数据多,继续运行 - Logstash Jdbc plugin filling more data in elasticsearch than the actual data, keeps on running
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM