简体   繁体   English

从postgres导入旧数据到elasticsearch

[英]import old data from postgres to elasticsearch

I have a lot of data in my postgres database( on a remote). 我的postgres数据库(远程)中有很多数据。 This is the data of the past 1 year, and I want to push it to elasticsearch now. 这是过去一年的数据,现在我想将其推向elasticsearch。

The data has a time field in it in this format 2016-09-07 19:26:36.817039+00 . 数据中有一个时间字段,格式为2016-09-07 19:26:36.817039+00

I want this to be the timefield( @timestamp ) in elasticsearch. 我希望这是@timestamp@timestamp )。 So that I can view it in kibana, and see some visualizations over the last year. 这样我就可以在kibana中查看它,并查看去年的一些可视化效果。

I need help on how do I push all this data efficiently. 我需要如何有效地推送所有这些数据的帮助。 I cannot get that how do I get all this data from postgres. 我不知道如何从postgres获取所有这些数据。

I know we can inject data via jdbc plugin, but I think I cannot create my @timestamp field with that. 我知道我们可以通过jdbc插件注入数据,但是我想不能用它创建我的@timestamp字段。

I also know about zombodb but not sure if that also gives me feature to give my own timefield. 我也了解zombodb,但不确定是否还可以赋予我自己的时间范围。

Also, the data is in bulk, so I am looking for an efficient solution 另外,数据是大量的,所以我正在寻找有效的解决方案

I need help on how I can do this. 我需要有关如何执行此操作的帮助。 So, suggestions are welcome. 因此,欢迎提出建议。

I know we can inject data via jdbc plugin, but I think I cannot create my @timestamp field with that. 我知道我们可以通过jdbc插件注入数据,但是我想不能用它创建我的@timestamp字段。

This should be doable with Logstash. 这应该对Logstash可行。 The first starting point should probably be this blog post . 第一个起点应该是这篇博客文章 And remember that Logstash always consists of 3 parts: 请记住,Logstash始终由3部分组成:

  1. Input: JDBC input . 输入: JDBC输入 If you only need to import once, skip the schedule otherwise set the right timing in cron syntax. 如果您只需要导入一次,请跳过schedule否则以cron语法设置正确的时间。
  2. Filter: This one is not part of the blog post. 筛选器:这不是博客文章的一部分。 You will need to use the Date filter to set the right @timestamp value — adding an example at the end. 您将需要使用日期过滤器设置正确的@timestamp值-在末尾添加一个示例。
  3. Output: This is simply the Elasticsearch output . 输出:这只是Elasticsearch输出

This will depend on the format and field name of the timestamp value in PostgreSQL, but the filter part should look something like this: 这将取决于PostgreSQL中时间戳值的格式和字段名称,但过滤器部分应如下所示:

date {
   match => ["your_date_field", "dd-mm-YYYY HH:mm:ss"]
   remove_field => "your_date_field" # Remove now redundant field, since we're storing it in @timestamp (the default target of date)
}

If you're concerned with the performance: 如果您担心性能:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM