[英]Enrich data using Storm bolt or Spark-streaming with MongoDB
I want to create a Storm Spout that reads data from a topic of Apache Kafka and sends this data to a Storm bolt that connects to MongoDB and query the message I collected from Kafka to enrich the data. 我想创建一个Storm Spout,它从Apache Kafka的主题中读取数据,并将此数据发送到连接到MongoDB的Storm螺栓,并查询我从Kafka收集的消息,以丰富数据。 For example: I have a personID (that I got through a message from Kafka) and I want to query the person address in MongoDB, using this personID.
例如:我有一个personID(我是通过来自Kafka的消息获得的),并且我想使用此personID在MongoDB中查询此人的地址。 In my MongoDB collection every document has personID and the address.
在我的MongoDB集合中,每个文档都有personID和地址。
Can anyone give me an example of that, please? 有人可以举一个例子吗? An example using Spark-streaming would also be really great.
使用Spark流式传输的示例也非常好。
I would approach this thus: 我将这样处理:
Perform your data enrichment using Kafka Streams , or KSQL . 使用Kafka Streams或KSQL执行数据扩充 。 Kafka Streams is part of Apache Kafka, and is a Java API.
Kafka Streams是Apache Kafka的一部分,并且是Java API。 KSQL runs on top of Kafka Streams, and gives you a SQL interface to declare your stream transformations.
KSQL在Kafka Streams之上运行,并为您提供一个SQL接口来声明您的流转换。 You can see an example, including joins, in this article .
您可以在本文中看到一个示例,包括联接。
Optionally, if you want to store the resulting enriched data elsewhere, use Kafka Connect to stream it from the Kafka topic to the target. (可选)如果您想将生成的丰富数据存储在其他位置,请使用Kafka Connect将其从Kafka主题流式传输到目标。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.