简体   繁体   English

您可以轻松地将AWS RDS Postgres日志发送到AWS Hadoop集群吗?

[英]Can you send AWS RDS Postgres logs to a AWS Hadoop cluster easily?

In particular, I'd like to push all of the INSERT, UPDATE, and DELETE statements from my Postgres logs to a AWS Hadoop cluster and have a nice way to search them to see the history of a row or rows. 特别是,我想将Postgres日志中的所有INSERT,UPDATE和DELETE语句推送到AWS Hadoop集群,并提供一种很好的搜索方式来查看一行或多行的历史记录。

I'm not a Hadoop expert in any way, so let me know if this is a red herring. 我无论如何都不是Hadoop专家,所以让我知道这是不是一个红鲱鱼。

Thanks! 谢谢!

Use flume to send logs from your RDS instance to Hadoop cluster. 使用flume将日志从RDS实例发送到Hadoop集群。 Using flume you could use regex interceptor to filter events and send just INSERT, UPDATE and DELETE statements. 使用flume,您可以使用正则表达式拦截器来过滤事件,并仅发送INSERT,UPDATE和DELETE语句。 Hadoop does not make your data searchable so you have to use something like Solr . Hadoop无法使您的数据可搜索,因此您必须使用诸如Solr之类的东西。

You could either get the data to Hadoop first and then run bunch of MapReduce jobs to insert data into Solr. 您可以先将数据获取到Hadoop,然后运行一堆MapReduce作业以将数据插入Solr。 Or you could directly configure flume to write data to Solr, see link below. 或者,您可以直接配置水槽将数据写入Solr,请参见下面的链接。

Links: 链接:

  1. Using flume solr sink 使用水槽水槽
  2. Flume Regex Filtering Interceptor Flume Regex过滤拦截器

EDIT: 编辑:

It seems like RDS instances don't have SSH access, which means that you cannot natively run flume on the RDS instance itself but you have to periodically get the logs of the RDS instance manually to a machine (this could be a EC2 instance) which has flume configured. 似乎RDS实例没有SSH访问权限,这意味着您不能在RDS实例本身上本地运行flume,但必须定期将RDS实例的日志手动获取到计算机(这可能是EC2实例),已配置水槽。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM