简体   繁体   English

对于 Elasticsearch 和 RabbitMQ,将数据导入 S3 的最佳方法是什么?

[英]What is the best approach to getting data into S3 for Elasticsearch and RabbitMQ?

In my company we developed a few games for which for some games the events are being sent to either Elasticsearch and others to RabbitMQ.在我的公司,我们开发了一些游戏,其中一些游戏的事件被发送到 Elasticsearch 和其他到 RabbitMQ。 We have a local CLI which grabs the data from both, compiles the messages into compressed (Gzip) JSON files after which another CLI converts them to SQL statements and throws them into a local SQL Server.我们有一个本地 CLI,它从两者中获取数据,将消息编译为压缩 (Gzip) JSON 文件,然后另一个 CLI 将它们转换为 SQL 语句并将它们扔到本地 Z9778840A01012B30BCA28 服务器中。 We want now to scale up but the current setup is painful and nowhere near real-time for analysis.我们现在想扩大规模,但目前的设置很痛苦,而且离实时分析还差得很远。

I've recently built an application in Python which I was planning to publish to a docker container in AWS.我最近在 Python 中构建了一个应用程序,我计划将其发布到 AWS 中的 docker 容器中。 The script grabs data from Elasticsearch, compiles into small compressed JSONS and publishes to an S3 bucket.该脚本从 Elasticsearch 抓取数据,编译成小型压缩 JSONS 并发布到 S3 存储桶。 From there the data is ingested into Snowflake for analysis.从那里数据被摄取到雪花中进行分析。 So far I was able to get the data in quite quickly and looks promising as an alternative.到目前为止,我能够很快地获取数据,并且看起来很有希望作为替代方案。

I was planning to do something similar with RabbitMQ but I wanted to find an even better alternative which would allow this ingestion process to happen seamlessly and help me avoid having to implement within the python code all sorts of exception calls.我计划用 RabbitMQ 做类似的事情,但我想找到一个更好的替代方案,它可以让这个摄取过程无缝地发生,并帮助我避免在 python 代码中实现各种异常调用。

  1. I've researched a bit and found there might be a way to link RabbitMQ to Amazon Kinesis Firehose.我进行了一些研究,发现可能有一种方法可以将 RabbitMQ 链接到 Amazon Kinesis Firehose。 My question would be: How would I send the stream from RabbitMQ to Kinesis?我的问题是:如何将 stream 从 RabbitMQ 发送到 Kinesis?

  2. For Elasticsearch, what is the best way to achieve this?对于 Elasticsearch,实现这一目标的最佳方法是什么? I've read about the logstash plugin for S3 ( https://www.elastic.co/guide/en/logstash/current/plugins-outputs-s3.html ) and about logstash plugin for kinesis ( https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kinesis.html ). I've read about the logstash plugin for S3 ( https://www.elastic.co/guide/en/logstash/current/plugins-outputs-s3.html ) and about logstash plugin for kinesis ( https://www. elastic.co/guide/en/logstash/current/plugins-inputs-kinesis.html )。 Which approach would be ideal for real-time ingestion?哪种方法最适合实时摄取?

My answer will be very theotic and need to be adapted tested in real world and adapted to your use case.我的回答将非常有神论,需要在现实世界中进行调整测试并适应您的用例。 For a near realtime behaviour, I would use logstash对于近乎实时的行为,我会使用logstash

You can create more scallable archi by output to RabbitMQ and use other pipeline to listen to the queue and execute other tasks.您可以通过 output 到 RabbitMQ 创建更多可扩展架构,并使用其他管道侦听队列并执行其他任务。

  • From logstash ES -> Rabbit MQ从logstash ES -> Rabbit MQ
  • From logstash RabbitMQ -> SQL从logstash RabbitMQ -> SQL
  • From logstash RabbitMQ -> Kinesis从logstash RabbitMQ -> Kinesis
  • From logstash RabbitMQ -> AWS从logstash RabbitMQ -> AWS
  • etc.... ETC....

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Elasticsearch 分页的最佳方法是什么? - What is the best approach for Elasticsearch pagination? Python将数据流传输到S3,但遇到104错误是最好的方法 - Python streaming data to S3 but getting 104 error is this the best way Tensorflow读取CSV - 什么是最好的方法 - Tensorflow Reading CSV - What's the best approach 最好的 Python 字符串拆分方法是什么? - What's the best Python string split approach? 这种数据库模型结构的最佳方法是什么? - What's the best approach for this database models structure? 推迟类属性初始化的最佳方法是什么? - What's the best approach to defer class attribute initialization? 在Celery中使用全局变量:最佳方法是什么? - Using global variables in Celery: What's the best approach? 如果我有两组具有相同数量值的数据,那么找出这些数据集中不匹配的值的最佳方法是什么? - If I have two sets of data with the same amount of values, what's the best approach to finding out the values in the sets that do not match? Elasticsearch 和 S3 存储桶:如何获取 Python 来检测来自 s3 存储桶的数据是否已经在 elasticsearch 中? - Elasticsearch and S3 bucket: how do I get Python to detect if data from s3 bucket are already in elasticsearch? 用Django / python在Amazon s3中转换文件的最佳方法是什么? - What is the best way to convert a file in amazon s3 with Django/python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM