FLINK：如何使用相同的StreamExecutionEnvironment从多个kafka集群中读取

Question

I want to read data from multiple KAFKA clusters in FLINK. 我想从FLINK中的多个KAFKA集群中读取数据。

But the result is that the kafkaMessageStream is reading only from first Kafka. 但结果是kafkaMessageStream只从第一个Kafka读取。

I am able to read from both Kafka clusters only if i have 2 streams separately for both Kafka , which is not what i want. 只有当我为Kafka分别有2个流时，我才能从两个Kafka集群中读取，这不是我想要的。

Is it possible to have multiple sources attached to single reader. 是否可以将多个源连接到单个阅读器。

sample code 示例代码

public class KafkaReader<T> implements Reader<T>{

private StreamExecutionEnvironment executionEnvironment ;

public StreamExecutionEnvironment getExecutionEnvironment(Properties properties){
    executionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment();
    executionEnvironment.setRestartStrategy( RestartStrategies.fixedDelayRestart(3, 1500));

    executionEnvironment.enableCheckpointing(
            Integer.parseInt(properties.getProperty(Constants.SSE_CHECKPOINT_INTERVAL,"5000")), CheckpointingMode.EXACTLY_ONCE);
    executionEnvironment.getCheckpointConfig().setCheckpointTimeout(60000);
    //executionEnvironment.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
    //try {
    //  executionEnvironment.setStateBackend(new FsStateBackend(new Path(Constants.SSE_CHECKPOINT_PATH)));
        // The RocksDBStateBackend or The FsStateBackend
    //} catch (IOException e) {
        // LOGGER.error("Exception during initialization of stateBackend in execution environment"+e.getMessage());
    }

    return executionEnvironment;
}
public DataStream<T> readFromMultiKafka(Properties properties_k1, Properties properties_k2 ,DeserializationSchema<T> deserializationSchema) {


    DataStream<T> kafkaMessageStream = executionEnvironment.addSource(new FlinkKafkaConsumer08<T>( 
            properties_k1.getProperty(Constants.TOPIC),deserializationSchema, 
            properties_k1));
    executionEnvironment.addSource(new FlinkKafkaConsumer08<T>( 
            properties_k2.getProperty(Constants.TOPIC),deserializationSchema, 
            properties_k2));

    return kafkaMessageStream;
}


public DataStream<T> readFromKafka(Properties properties,DeserializationSchema<T> deserializationSchema) {


    DataStream<T> kafkaMessageStream = executionEnvironment.addSource(new FlinkKafkaConsumer08<T>( 
            properties.getProperty(Constants.TOPIC),deserializationSchema, 
            properties));

    return kafkaMessageStream;
}

} }

My calls: 我的电话：

 public static void main( String[] args ) throws Exception
{
    Properties pk1 = new Properties();
    pk1.setProperty(Constants.TOPIC, "flink_test");
    pk1.setProperty("zookeeper.connect", "localhost:2181");
    pk1.setProperty("group.id", "1");
    pk1.setProperty("bootstrap.servers", "localhost:9092");
    Properties pk2 = new Properties();
    pk2.setProperty(Constants.TOPIC, "flink_test");
    pk2.setProperty("zookeeper.connect", "localhost:2182");
    pk2.setProperty("group.id", "1");
    pk2.setProperty("bootstrap.servers", "localhost:9093");


    Reader<String> reader = new KafkaReader<String>();
    //Do not work

    StreamExecutionEnvironment environment = reader.getExecutionEnvironment(pk1);
    DataStream<String> dataStream1 = reader.readFromMultiKafka(pk1,pk2,new SimpleStringSchema());
    DataStream<ImpressionObject> transform = new TsvTransformer().transform(dataStream);

    transform.print();      


  //Works:

    StreamExecutionEnvironment environment = reader.getExecutionEnvironment(pk1);
    DataStream<String> dataStream1 = reader.readFromKafka(pk1, new SimpleStringSchema());
    DataStream<String> dataStream2 = reader.readFromKafka(pk2, new SimpleStringSchema());

    DataStream<Tuple2<String, Integer>> transform1 = dataStream1.flatMap(new LineSplitter()).keyBy(0)
    .timeWindow(Time.seconds(5)).sum(1).setParallelism(5);
    DataStream<Tuple2<String, Integer>> transform2 = dataStream2.flatMap(new LineSplitter()).keyBy(0)
            .timeWindow(Time.seconds(5)).sum(1).setParallelism(5);


    transform1.print();     
    transform2.print();     

    environment.execute("Kafka Reader");
}

Answer 1

To resolve the issue, I would recommend you to create separate instances of the FlinkKafkaConsumer for each cluster (that's what you are already doing), and then union the resulting streams: 要解决此问题，我建议您为每个群集创建单独的FlinkKafkaConsumer实例（这就是您正在做的事情），然后将生成的流联合起来：

StreamExecutionEnvironment environment = reader.getExecutionEnvironment(pk1);
DataStream<String> dataStream1 = reader.readFromKafka(pk1, new SimpleStringSchema());
DataStream<String> dataStream2 = reader.readFromKafka(pk2, new SimpleStringSchema());
DataStream<String> finalStream = dataStream1.union(dataStream2);

FLINK：如何使用相同的StreamExecutionEnvironment从多个kafka集群中读取

问题描述

1 个解决方案

解决方案1
8 已采纳 2016-08-17 07:43:36

FLINK：如何使用相同的StreamExecutionEnvironment从多个kafka集群中读取

问题描述

1 个解决方案

解决方案1 8 已采纳 2016-08-17 07:43:36

解决方案1
8 已采纳 2016-08-17 07:43:36