简体繁体 English

将数据从Oracle /关系表索引到弹性搜索的更好方法是什么？

[英]what is the better way to index data from Oracle/relational tables into elastic search?

原文 2014-12-15 14:20:41 0 2 java/ oracle/ elasticsearch/ relational-database/ elasticsearch-plugin

What are the options to index large data from Oracle DB to elastic search cluster? 将大数据从Oracle DB索引到弹性搜索集群有哪些选项？ Requirement is to index 300Million records one time into multiple indexes and also incremental updates having around approximate 1 Million changes every day. 要求是将300Million记录一次索引到多个索引中，并且还增量更新每天大约有大约1百万个更改。

I have tried JDBC plugin for elasticsearch river/feeder , both seems to be running inside or require locally running elastic search instance. 我已尝试过弹性搜索河/馈线的 JDBC插件，两者似乎都在内部运行或需要本地运行的弹性搜索实例。 Please let me know if there is any better option for running elastic search indexer as a standalone job (probably java based). 如果有更好的选择将弹性搜索索引器作为独立作业（可能是基于java的）运行，请告诉我。 Any suggestions will be very helpful. 任何建议都会非常有帮助。 Thanks. 谢谢。

2 个解决方案

We use ES as a reporting db and when new records are written to SQL we take the following action to get them into ES: 我们使用ES作为报告数据库，当新记录写入SQL时，我们采取以下操作将它们引入ES：

Write the primary key into a queue (we use rabbitMQ) 将主键写入队列（我们使用rabbitMQ）
Rabbit picks up the primary key (when it has time) and queries the relation DB to get the info it needs and then writes the data into ES Rabbit获取主键（当它有时间时）并查询关系DB以获取所需的信息，然后将数据写入ES

This process works great because it handles both new data and old data. 此过程非常有效，因为它可以处理新数据和旧数据。 For old data just write a quick script to write 300M primary keys into rabbit and you're done! 对于旧数据，只需编写一个快速脚本，将300M主键写入兔子，就完成了！

there are many integration options - I've listed out a few to give you some ideas, the solution is really going to depend on your specific resources and requirements though. 有很多集成选项 - 我已经列出了一些给你一些想法，但解决方案实际上将取决于你的具体资源和要求。

Oracle Golden Gate will look at the Oracle DB transaction logs and feed them in real-time to ES. Oracle Golden Gate将查看Oracle数据库事务日志并将其实时提供给ES。
ETL for example Oracle Data Integrator could run on a schedule and pull data from your DB, transform it and send to ES. ETL例如Oracle Data Integrator可以按计划运行，从数据库中提取数据，转换并发送给ES。
Create triggers in the Oracle DB so that data updates can be written to ES using a stored procedure. 在Oracle DB中创建触发器，以便可以使用存储过程将数据更新写入ES。 Or use the trigger to write flags to a "changes" table that some external process (eg a Java application) monitors and uses to extract data from the Oracle DB. 或者使用触发器将标志写入“更改”表，某些外部进程（例如Java应用程序）监视并使用该表从Oracle DB中提取数据。
Get the application that writes to the Oracle DB to also feed ES. 获取写入Oracle DB的应用程序也可以提供ES。 Ideally your application and Oracle DB should be loosely coupled - do you have an integration platform that can feed the messages to both ES and Oracle? 理想情况下，您的应用程序和Oracle数据库应该松散耦合 - 您是否有可以将消息提供给ES和Oracle的集成平台？