简体   繁体   中英

spark streaming different value decoder per kafka topic

I need to create a Spark streaming that reads from several topic, and uses a different decoder per each topic (each topic contains a different avro-encoded obect):

def decode_avro(message):
    schem = avro.schema.parse(open("error_list.avsc").read())
    bytes_reader = io.BytesIO(message)
    decoder = avro.io.BinaryDecoder(bytes_reader)
    reader = avro.io.DatumReader(schem)
    return reader.read(decoder)

ssc = StreamingContext(sc, 2)
kvs = KafkaUtils.createDirectStream(ssc, [topic, topic2], {
    "metadata.broker.list": brokers}, valueDecoder = decode_avro)

I wan't to know if it is possible to specify different decoder callbacks per topic, or if it is possible to know the topic name on the decoder function (on this way I could used the topic name for the avro schema file and decode all messages in the same function)

Thank you

We also have this case where we read from different topics, with different message formats, and then process each topic and store the output into dedicated storage per source topic. The right way to go here, is to create multiple streams. Stream per topic, in the same application, with the same Spark context. Each stream will get the relevant ValueDecoder, and you may still read from multiple topics if they share the same format.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM