简体   繁体   中英

Fastest way to sink JSON to Kafka using FLINK

Code Optimisation

I have a flink application that reads data from a url/port performs processing on it and returns a JSON. I then convert the JSON to a String and sink it to Kafka.

Current Performance & Noted Issue

If I just perform the processing -> I can run about 30,000 strings through the function, however when I add the function to convert it to STring and then sink to kafka My throughput drops to 17,000 strings per second.

Do I need to convert my JSON to String before I sink to Kafka? If not how do I sink a json ObjectNode to kafka?

Else what other solutions are there. I think the bottleneck is the to String Function

I tried converting the JSON to a String using several methods (.toString function, StringBuilder to String).

 // Read from Source
 val in_stream = env.socketTextStream(url, port,      socket_stream_deliminator, socket_connection_retries).setParallelism(1)

 // Perform Process
 .map(x=>{Process(x)}).setParallelism(1)

 // Convert to STring
 .map(x => ObjectNodeToString({
     val json_string_builder = StringBuilder.newBuilder
     json_string_builder.append(x)
     return json_string_builder.toString()
 })).setParallelism(1)

 // sink data
 .addSink(new FlinkKafkaProducer[String](broker_hosts, global_topic, new SimpleStringSchema()))

I would like to mantain the 30,000 strings processing per second. which I do get with out the convert to string function. Can I sink the ObjectNode directly to kafka?

You can. Sink is serializing given objects to a byte array before sending it to kafka. Make sure your sink function is supplied with serializer which is capable to convert ObjectNode to a byte array.

Also make sure that consumer is ready to receive ObjectNode objects, not Strings.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM