简体   繁体   English

将数据从postgres索引到solr / elasticsearch

[英]Indexing data from postgres to solr/elasticsearch

What is the best way to index constantly changing data in a PostgreSQL database to a Solr/Elasticsearch database? 将PostgreSQL数据库中不断变化的数据索引到Solr / Elasticsearch数据库的最佳方法是什么?

I have a postgres database on AWS RDS and i want to perform complex search on it. 我在AWS RDS上有一个postgres数据库,我想对它进行复杂的搜索。 However the data i will query against is constantly changing with very high writes/ updates. 但是,我将查询的数据随着非常高的写入/更新而不断变化。 So i am not sure how i should transfer the data to the solr/ elasticsearch efficiently and reliably. 所以我不确定如何高效可靠地将数据传输到solr / elasticsearch。

Thanks for the help 谢谢您的帮助

At the risk of someone marking this question as a duplicate, here's the link to setting up postgres-to-elasticsearch in another StackOverflow thread. 冒着将某个问题标记为重复的风险,这里是在另一个StackOverflow线程中设置postgres-to-elasticsearch的链接。 There's also this blog post on Atlassian that also talks about how to get real time updates from PostgreSQL into ElasticSearch. 这篇关于Atlassian的博客文章还讨论了如何从PostgreSQL到ElasticSearch的实时更新。

The Atlassian thread, for the tl;dr crowd, uses stored PGS procedures to copy updated/inserted data to a staging table, then separately processes the staging table. 对于tl; dr crowd,Atlassian线程使用存储的PGS过程将更新/插入的数据复制到临时表,然后单独处理登台表。 It's a nice approach that would work for either ES or Solr. 这是一种适用于ES或Solr的好方法。 Unfortunately, it's a roll-your-own solution, unless you are familiar with Clojure. 不幸的是,除非你熟悉Clojure,否则这是一个自己动手的解决方案。

In case of Solr , a general approach is to use Data Import Handler ( DIH for short). 对于Solr ,一般方法是使用Data Import Handler (简称DIH )。 Config the full-import & delta-import sql properly, where delta import import data from database that changes since last import judging via timestamps (so, u need design schema with proper timestamps). 正确配置完全导入和增量导入sql,其中delta import从数据库导入数据,该数据自上次导入后通过时间戳判断(因此,您需要具有适当时间戳的设计模式)。

The timing of delta-import , has 2 styles which could be used separately or combined: delta-import时间有2种样式,可单独使用或组合使用:

  • Do delta-import with a timer. 使用计时器进行delta导入。 (eg every 5 minutes) (例如每5分钟一次)
  • After each update in database, make a call to delta-import. 在数据库中每次更新后,调用delta-import。

Refer to https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler for DIH detail. 有关DIH详细信息,请参阅https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with++++++++ DIH

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM