简体   繁体   中英

Generate data with apache kafka and receive it using spark streaming

I would like to know how can I do in the same program to generate random data using apache Kafka and receive it using spark streaming.

Let's show a use case:

I want to generate random data like this -> (A, B, ab@hotmail.com) while X seconds. And then I want to receive this data for processing it in real time (while I'm receiving it), and if the second parameter is B send an email to 'ab@hotmail.com' with the following message: "The first parameter is A".

I know that I have to start a zookeeper server, then start a kafka broker, then create a topic, and then a producer for produce and send this data. For create the connection between kafka and streaming I need to use "createStream" function. But I don't know how to use a producer to send this data and then receive it with spark streaming for processing it. All this in the same program and using Java.

Any help? Thank you.

There will not be a single program, but a Kafka producer program and a Spark program. For both, there are couple of examples available online, eg:

To run this, you start Kafka (including ZK) and your Spark cluster. Afterwards, you start your Producer program that writes into Kafka and you Spark job that reads from Kafka (I guess the order to start Producer and Spark job should not matter).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM