[英]Kafka Spark streaming: unable to read messages
I am integrating Kafka and Spark, using spark-streaming. 我正在整合Kafka和Spark,使用spark-streaming。 I have created a topic as a kafka producer:
我创建了一个作为kafka制作人的主题:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
I am publishing messages in kafka and trying to read them using spark-streaming java code and displaying them on screen. 我正在kafka发布消息并尝试使用spark-streaming java代码读取它们并在屏幕上显示它们。
The daemons are all up: Spark-master,worker; 守护进程全都出现了:Spark-master,worker; zookeeper;
动物园管理员; kafka.
卡夫卡。
I am writing a java code for doing it, using KafkaUtils.createStream 我正在使用KafkaUtils.createStream编写一个java代码
code is below: 代码如下:
public class SparkStream {
public static void main(String args[])
{
if(args.length != 3)
{
System.out.println("SparkStream <zookeeper_ip> <group_nm> <topic1,topic2,...>");
System.exit(1);
}
Map<String,Integer> topicMap = new HashMap<String,Integer>();
String[] topic = args[2].split(",");
for(String t: topic)
{
topicMap.put(t, new Integer(1));
}
JavaStreamingContext jssc = new JavaStreamingContext("spark://192.168.88.130:7077", "SparkStream", new Duration(3000));
JavaPairReceiverInputDStream<String, String> messages = KafkaUtils.createStream(jssc, args[0], args[1], topicMap );
System.out.println("Connection done++++++++++++++");
JavaDStream<String> data = messages.map(new Function<Tuple2<String, String>, String>()
{
public String call(Tuple2<String, String> message)
{
System.out.println("NewMessage: "+message._2()+"++++++++++++++++++");
return message._2();
}
}
);
data.print();
jssc.start();
jssc.awaitTermination();
}
}
I am running the job, and at other terminal I am running kafka-producer to publish messages: 我正在运行这个工作,在其他终端我正在运行kafka-producer来发布消息:
Hi kafka
second message
another message
But the output logs at the spark-streaming console doesn't show the messages, but shows zero blocks received: 但是,spark-streaming控制台上的输出日志不会显示消息,但会显示收到的零块:
-------------------------------------------
Time: 1417438988000 ms
-------------------------------------------
2014-12-01 08:03:08,008 INFO [sparkDriver-akka.actor.default-dispatcher-4] scheduler.JobScheduler (Logging.scala:logInfo(59)) - Starting job streaming job 1417438988000 ms.0 from job set of time 1417438988000 ms
2014-12-01 08:03:08,008 INFO [sparkDriver-akka.actor.default-dispatcher-4] scheduler.JobScheduler (Logging.scala:logInfo(59)) - Finished job streaming job 1417438988000 ms.0 from job set of time 1417438988000 ms
2014-12-01 08:03:08,009 INFO [sparkDriver-akka.actor.default-dispatcher-4] scheduler.JobScheduler (Logging.scala:logInfo(59)) - Total delay: 0.008 s for time 1417438988000 ms (execution: 0.000 s)
2014-12-01 08:03:08,010 INFO [sparkDriver-akka.actor.default-dispatcher-15] scheduler.JobScheduler (Logging.scala:logInfo(59)) - Added jobs for time 1417438988000 ms
2014-12-01 08:03:08,015 INFO [sparkDriver-akka.actor.default-dispatcher-15] rdd.MappedRDD (Logging.scala:logInfo(59)) - Removing RDD 39 from persistence list
2014-12-01 08:03:08,024 INFO [sparkDriver-akka.actor.default-dispatcher-4] storage.BlockManager (Logging.scala:logInfo(59)) - Removing RDD 39
2014-12-01 08:03:08,027 INFO [sparkDriver-akka.actor.default-dispatcher-15] rdd.BlockRDD (Logging.scala:logInfo(59)) - Removing RDD 38 from persistence list
2014-12-01 08:03:08,031 INFO [sparkDriver-akka.actor.default-dispatcher-2] storage.BlockManager (Logging.scala:logInfo(59)) - Removing RDD 38
2014-12-01 08:03:08,033 INFO [sparkDriver-akka.actor.default-dispatcher-15] kafka.KafkaInputDStream (Logging.scala:logInfo(59)) - Removing blocks of RDD BlockRDD[38] at BlockRDD at ReceiverInputDStream.scala:69 of time 1417438988000 ms
2014-12-01 08:03:09,002 INFO [sparkDriver-akka.actor.default-dispatcher-2] scheduler.ReceiverTracker (Logging.scala:logInfo(59)) - Stream 0 received 0 blocks
Why isn't the data block getting received? 为什么没有收到数据块? i have tried using kafka producer-consumer on console
bin/kafka-console-producer....
and bin/kafka-console-consumer...
its working perfect, but why not my code... any idea ? 我已经尝试在控制台
bin/kafka-console-producer....
和bin/kafka-console-consumer...
-consumer上使用kafka producer-consumer bin/kafka-console-consumer...
它的工作完美,但为什么不是我的代码...任何想法?
Issue solved. 问题解决了。
the code above is correct. 上面的代码是正确的。 We will just add two more lines to supress the [INFO] and [WARN] generated.
我们将再添加两行来抑制生成的[INFO]和[WARN]。 So the final code is:
所以最终的代码是:
package com.spark;
import scala.Tuple2;
import org.apache.log4j.Logger;
import org.apache.log4j.Level;
import kafka.serializer.Decoder;
import kafka.serializer.Encoder;
import org.apache.spark.streaming.Duration;
import org.apache.spark.*;
import org.apache.spark.api.java.function.*;
import org.apache.spark.api.java.*;
import org.apache.spark.streaming.kafka.KafkaUtils;
import org.apache.spark.streaming.kafka.*;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairReceiverInputDStream;
import java.util.Map;
import java.util.HashMap;
public class SparkStream {
public static void main(String args[])
{
if(args.length != 3)
{
System.out.println("SparkStream <zookeeper_ip> <group_nm> <topic1,topic2,...>");
System.exit(1);
}
Logger.getLogger("org").setLevel(Level.OFF);
Logger.getLogger("akka").setLevel(Level.OFF);
Map<String,Integer> topicMap = new HashMap<String,Integer>();
String[] topic = args[2].split(",");
for(String t: topic)
{
topicMap.put(t, new Integer(3));
}
JavaStreamingContext jssc = new JavaStreamingContext("local[4]", "SparkStream", new Duration(1000));
JavaPairReceiverInputDStream<String, String> messages = KafkaUtils.createStream(jssc, args[0], args[1], topicMap );
System.out.println("Connection done++++++++++++++");
JavaDStream<String> data = messages.map(new Function<Tuple2<String, String>, String>()
{
public String call(Tuple2<String, String> message)
{
return message._2();
}
}
);
data.print();
jssc.start();
jssc.awaitTermination();
}
}
Also we need to add dependency in the POM.xml: 我们还需要在POM.xml中添加依赖项:
<dependency>
<groupId>com.msiops.footing</groupId>
<artifactId>footing-tuple</artifactId>
<version>0.2</version>
</dependency>
This dependency is used for making use of the scala.Tuple2
此依赖项用于使用
scala.Tuple2
The error of Stream 0 received 0 block
was due to the spark-worker not available and and the spark-worker-core was set to 1. For spark-streaming we need the core to be >=2. Stream 0 received 0 block
的错误是由于spark-worker不可用而且spark-worker-core设置为1.对于spark-streaming,我们需要核心> = 2。 So we need to make changes in the spark-config file. 所以我们需要在spark-config文件中进行更改。 Refer the installation manual.
请参阅安装手册。 to add the line
export SPARK_WORKER_CORE=5
Also change the SPARK_MASTER='hostname'
to SPARK_MASTER=<your local IP>
. 添加行
export SPARK_WORKER_CORE=5
还要将SPARK_MASTER='hostname'
更改为SPARK_MASTER=<your local IP>
。 This local ip is what you see in BOLD when you go to your Spark UI web console...something like: spark://192.168..:<port>
. 当您访问Spark UI Web控制台时,您在BOLD中看到的本地IP ...类似于:
spark://192.168..:<port>
。 We dont need the port here. 我们这里不需要这个端口。 only the IP is required.
只需要IP。
Now restart your spark-master and spark-worker and start streaming :) 现在重新启动你的spark-master和spark-worker并开始流式传输:)
output: 输出:
-------------------------------------------
Time: 1417443060000 ms
-------------------------------------------
message 1
-------------------------------------------
Time: 1417443061000 ms
-------------------------------------------
message 2
-------------------------------------------
Time: 1417443063000 ms
-------------------------------------------
message 3
message 4
-------------------------------------------
Time: 1417443064000 ms
-------------------------------------------
message 5
message 6
messag 7
-------------------------------------------
Time: 1417443065000 ms
-------------------------------------------
message 8
是的,您需要从DStream访问内容。
messages.foreachRDD(<<processing for the input received in the interval>>);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.