[英]Flink on Yarn, parallel source with Kafka
我想在我的Flink工作中与我的Kafka源代码保持并行,但是到目前为止我还是失败了。
我为Kafka生产者设置了4个分区:
$ ./bin/kafka-topics.sh --describe --zookeeper X.X.X.X:2181 --topic mytopic
Topic:mytopic PartitionCount:4 ReplicationFactor:1 Configs:
Topic: mytopic Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Topic: mytopic Partition: 1 Leader: 0 Replicas: 0 Isr: 0
Topic: mytopic Partition: 2 Leader: 0 Replicas: 0 Isr: 0
Topic: mytopic Partition: 3 Leader: 0 Replicas: 0 Isr: 0
我的scala代码如下:
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(4)
env.getConfig.setGlobalJobParameters(params)
// **** Kafka CONNECTION ****
val properties = new Properties();
properties.setProperty("bootstrap.servers", params.get("server"));
properties.setProperty("group.id", "test");
// **** Get KAFKA source ****
val stream: DataStream[String] = env.addSource(new FlinkKafkaConsumer010[String](params.get("topic"), new SimpleStringSchema(), properties))
我在YARN上工作:
$ ./bin/flink run -m yarn-cluster -yn 4 -yjm 8192 -ynm test -ys 1 -ytm 8192 myjar.jar --server X.X.X.X:9092 --topic mytopic
我尝试了一堆东西,但是我的资源没有并行化:
有几个Kafka分区和至少同样多的插槽/任务管理器应该这样做,对吗?
我遇到过同样的问题。 我建议您检查两件事。
producer.send(new ProducerRecord<String,String>("topicName","yourKey","yourMessage")
至
producer.send(new ProducerRecord<String,String>("topicName",null,"yourMessage")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.