简体繁体 English

Logstash /不是logtash用于kafka-elasticsearch集成？

[英]Logstash/not logstash for kafka-elasticsearch integration?

原文 2015-12-04 16:07:25 5 1 elasticsearch/ logstash/ apache-kafka

I read that elasticsearch rivers/river plugins are deprecated. 我读到elasticsearch rivers / river插件已被弃用。 So we cannot directly have elasticsearch-kafka integration. 因此，我们不能直接进行elasticsearch-kafka集成。 If we want to do this then we need to have some java(or any language) layer in between that puts the data from kafka to elastic search using its apis. 如果要执行此操作，则需要在两者之间具有一些Java（或任何语言）层，以使用其api将来自kafka的数据进行弹性搜索。

On the other hand – if we have kafka-logstash-elasticsearch – that we get rid of the above middle layer and achieve that through logstash with just configuration. 另一方面-如果我们有kafka-logstash-elasticsearch-我们摆脱了上面的中间层，并通过仅配置的logstash实现了这一点。 But I am not sure if having logstash in between is an overhead or not? 但是我不确定介于两者之间是否有开销？

And is my undertsanding right? 我的理解是正确的吗？ Thanks in advance for the inputs. 在此先感谢您的投入。

Regards, Priya 问候，普里亚

1 个解决方案

Your question is quite general. 您的问题很笼统。 It would be good to understand your architecture, its purpose and assumptions you made. 最好了解您的体系结构，其目的和所做的假设。

Kafka, as it is stated in its documentation, is a massively scalable publish-subscribe messaging system. 正如其文档中所述，Kafka是一个可大规模扩展的发布-订阅消息系统。 My assumption would be that you use it to as a data broker in your architecture. 我的假设是您将其用作体系结构中的数据代理。

Elasticsearch on the other hand, is a search engine, hence I assume that you use it as a data access/searching/aggregation layer. 另一方面，Elasticsearch是搜索引擎，因此我假设您将其用作数据访问/搜索/聚合层。

These two separate systems require connectors to create a proper data-pipeline. 这两个独立的系统需要连接器来创建适当的数据管道。 That's where Logstash comes in. It allows you to create data streaming connection between, in your case, Kafka and Elasticsearch. 这就是Logstash的用处。它允许您在Kafka和Elasticsearch之间创建数据流连接。 It also allows you to mutate the data on the fly, depending on your needs. 它还允许您根据需要动态更改数据。

Ideally, Kafka uses raw data events. 理想情况下，Kafka使用原始数据事件。 Elasticsearch stores documents which are useful to your data consumers (web or mobile application, other systems etc.), so can be quite different to the raw data format. Elasticsearch存储的文档对您的数据使用者（Web或移动应用程序，其他系统等）有用，因此可能与原始数据格式完全不同。 If you need to modify the data between its raw form, and ES document, that's where Logstash might be handy (see filters stage). 如果您需要在原始表单和ES文档之间修改数据，那么Logstash可能会很方便（请参阅过滤器阶段）。

Another approach could be to use Kafka Connectors, building custom tools eg based on Kafka Streams or Consumers, but it really depends on the concepts of your architecture - purpose, stack, data requirements and more. 另一种方法可能是使用Kafka连接器，例如基于Kafka Streams或Consumers构建自定义工具，但这实际上取决于体系结构的概念-目的，堆栈，数据需求等。