简体繁体 English

使用Storm螺栓或Spark流与MongoDB丰富数据

[英]Enrich data using Storm bolt or Spark-streaming with MongoDB

原文 2018-04-18 14:24:06 2 1 mongodb/ apache-kafka/ spark-streaming/ apache-storm

I want to create a Storm Spout that reads data from a topic of Apache Kafka and sends this data to a Storm bolt that connects to MongoDB and query the message I collected from Kafka to enrich the data. 我想创建一个Storm Spout，它从Apache Kafka的主题中读取数据，并将此数据发送到连接到MongoDB的Storm螺栓，并查询我从Kafka收集的消息，以丰富数据。 For example: I have a personID (that I got through a message from Kafka) and I want to query the person address in MongoDB, using this personID. 例如：我有一个personID（我是通过来自Kafka的消息获得的），并且我想使用此personID在MongoDB中查询此人的地址。 In my MongoDB collection every document has personID and the address. 在我的MongoDB集合中，每个文档都有personID和地址。

Can anyone give me an example of that, please? 有人可以举一个例子吗？ An example using Spark-streaming would also be really great. 使用Spark流式传输的示例也非常好。

1 个解决方案

I would approach this thus: 我将这样处理：

Stream all your data into Kafka, including your MongoDB enrichment source (addresses, etc). 将所有数据流式传输到Kafka，包括您的MongoDB扩充源（地址等）。 You can use Kafka Connect (part of Apache Kafka) to do this. 您可以使用Kafka Connect（Apache Kafka的一部分）来执行此操作。 Check out this article: Streaming Data from MongoDB into Kafka with Kafka Connect and Debezium . 查看这篇文章：使用Kafka Connect和Debezium将数据从MongoDB流到Kafka中。
Perform your data enrichment using Kafka Streams , or KSQL . 使用Kafka Streams或KSQL执行数据扩充。 Kafka Streams is part of Apache Kafka, and is a Java API. Kafka Streams是Apache Kafka的一部分，并且是Java API。 KSQL runs on top of Kafka Streams, and gives you a SQL interface to declare your stream transformations. KSQL在Kafka Streams之上运行，并为您提供一个SQL接口来声明您的流转换。 You can see an example, including joins, in this article . 您可以在本文中看到一个示例，包括联接。
1. KSQL introduction KSQL简介
2. Kafka Streams introduction Kafka Streams简介
Optionally, if you want to store the resulting enriched data elsewhere, use Kafka Connect to stream it from the Kafka topic to the target. （可选）如果您想将生成的丰富数据存储在其他位置，请使用Kafka Connect将其从Kafka主题流式传输到目标。