简体   繁体   English

Logstash:从MySQL到Elasticsearch(大表)?

[英]Logstash: MySQL to Elasticsearch (large table)?

I am attempting to import a rather chunky database into Elasticsearch. 我正在尝试将相当庞大的数据库导入Elasticsearch。 It has 4m rows across 2 columns (VARCHAR(250) & INT(20)). 它在2列(VARCHAR(250)和INT(20))中有4m行。

When I run the logstash.conf file to import the database into Elasticsearch and I add a LIMIT 0,100 in my SQL command the command runs without any problems. 当我运行logstash.conf文件将数据库导入Elasticsearch并在我的SQL命令中添加LIMIT 0,100时,该命令运行没有任何问题。 All of the rows end up returned by Logstash in Terminal and then I can see them in the relevant node in Elasticsearch. 所有行最终都由Terminal中的Logstash返回,然后我可以在Elasticsearch的相关节点中看到它们。

When I try to run all of the rows through Logstash at once, it outputs: 当我尝试一次通过Logstash运行所有行时,它输出:

Settings: Default pipeline workers: 1 Pipeline Main Started

And nothing more happens. 而且什么也没有发生。

How do I add such a large table into Elasticsearch? 如何在Elasticsearch中添加这么大的表?

Here's my logstash.conf script: 这是我的logstash.conf脚本:

input{

  jdbc {
jdbc_driver_library => "/opt/logstash/mysql-connector-java-5.1.39/mysql-connector-java-5.1.39-bin.jar"

jdbc_driver_class => "com.mysql.jdbc.Driver"

jdbc_connection_string => "jdbc:mysql://<ip number>:3306/database"

jdbc_validate_connection => true

jdbc_user => "elastic"

jdbc_password => "password"

schedule => "* * * * *"

    statement => "name, id from master_table"

    }

}

 output

 {
  elasticsearch
    {

    index => "search"
    document_type => "name"
    document_id => "%{id}"
    hosts => "127.0.0.1:9200"
}stdout { codec => json_lines }
}

I would set the fetch_size to something like 10000 documents. 我将fetch_size设置为10000个文档。 Here I think that it tries to load all records in memory which can take hours and probably won't fit. 在这里,我认为它会尝试将所有记录加载到内存中,这可能要花费数小时,而且可能不合适。

See https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#_dealing_with_large_result_sets 参见https://www.elastic.co/guide/zh-CN/logstash/current/plugins-inputs-jdbc.html#_dealing_with_large_result_sets

input {
    jdbc {
        ...
        jdbc_page_size => 100000 
        jdbc_paging_enabled => true
    }
}
...

Res: RES:

[logstash.inputs.jdbc] SELECT count(*) AS count FROM (SELECT * FROM my_table) AS t1 LIMIT 1 [logstash.inputs.jdbc] SELECT count(*)AS count FROM(SELECT * FROM my_table)AS t1 LIMIT 1

[logstash.inputs.jdbc] SELECT * FROM (SELECT * FROM my_table) AS t1 LIMIT 100000 OFFSET 0 [logstash.inputs.jdbc] SELECT * FROM(SELECT * FROM my_table)AS t1 LIMIT 100000 OFFSET 0

[logstash.inputs.jdbc]SELECT * FROM (SELECT * FROM my_table) AS t1 LIMIT 100000 OFFSET 100000 [logstash.inputs.jdbc] SELECT * FROM(SELECT * FROM my_table)AS t1 LIMIT 100000 OFFSET 100000

... ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM