简体   繁体   English

将数据从PostgreSQL索引到Elasticsearch

[英]Index data from PostgreSQL to Elasticsearch

I want to setup an elasticsearch cluster using multicast feature.One node is a external elasticsearch node and the other node is a node client (client property set as true-not hold data). 我想使用多播功能设置一个Elasticsearch集群。一个节点是一个外部Elasticsearch节点,另一个节点是一个节点客户端(客户端属性设置为true-not hold data)。

This node client is created using spring data elasticsearch. 该节点客户端是使用spring数据elasticsearch创建的。 So I want to index data from postgresql database to external elasticsearch node.I had indexed data by using jdbc river plugin. 所以我想将数据从Postgresql数据库索引到外部elasticsearch节点。我已经使用jdbc river插件对数据进行了索引。

But I want to know is there any application that I can use for index data from postgresql instead of using the river plugin ? 但是我想知道有没有可以使用postgresql的索引数据而不是使用river插件的应用程序

It is possible to do this in realtime, although it requires writing a dedicated Postgres->ES gateway and using some Postgres-specific features. 尽管它需要编写专用的Postgres-> ES网关并使用某些Postgres特定的功能,但也可以实时执行此操作。 I've written about it here: http://haltcondition.net/2014/04/realtime-postgres-elasticsearch/ 我在这里写过: http : //haltcondition.net/2014/04/realtime-postgres-elasticsearch/

The principle is actually pretty simple, complexity of the method I have come up with is due to handling corner cases such as multiple gateways running and gateways becoming unavailable for a while. 原理实际上很简单,我想出的方法的复杂性是由于处理了一些极端情况,例如多个网关正在运行,并且网关暂时无法使用。 In short my solution is: 简而言之,我的解决方案是:

  • Attach a trigger to all tables of interest that copies the updated row IDs to a temporary table. 将触发器附加到所有感兴趣的表,该表将更新的行ID复制到临时表。
  • The trigger also emits an async notification that a row has been updated. 触发器还会发出异步通知,说明行已更新。
  • A separate gateway (mine is written in Clojure) attaches to the Postgres server and listens for notifications. 一个单独的网关(用Clojure编写的我的网关)连接到Postgres服务器,并监听通知。 This is the tricky part, as not all Postgres client drivers support async notifications (there is a new experimental JDBC driver that does, which is what I use). 这是棘手的部分,因为并非所有Postgres客户端驱动程序都支持异步通知(有一个新的实验性JDBC驱动程序可以使用,这就是我所使用的)。
  • On update the gateway reads, transforms and pushes the data to Elasticsearch. 更新时,网关读取,转换数据并将数据推送到Elasticsearch。

In my experiments this model is capable of sub-second updates to Elasticsearch after a Postgres row insert/update. 在我的实验中,该模型能够在Postgres行插入/更新之后亚秒内对Elasticsearch进行更新。 Obviously this will vary in the real world though. 显然,这在现实世界中会有所不同。

There is a proof-of-concept project with Vagrant and Docker test frameworks here: https://bitbucket.org/tarkasteve/postgres-elasticsearch-realtime 这里有一个带有Vagrant和Docker测试框架的概念验证项目: https : //bitbucket.org/tarkasteve/postgres-elasticsearch-realtime

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM