I am trying to setup a simple kafka stack in local and I am at the point where I need to create a toy Producer. This: https://lombardo-chcg.github.io/tools/2017/09/29/kafka-avro-producer-in-scala.html (see below for the piece of code I'm interested in) is almost exactly what I want except:
Here the producer sends a GenericData.Record object, so the whole schema is sent and it doesn't leverage the schema registry. I want to send an Array[Byte] with the first few bytes being the id of the schema and the following bytes being the data, without the schema (or so I think it is the optimal way to do it)
The piece of code I am talking about:
import java.util.Properties
import org.apache.avro.Schema.Parser
import org.apache.avro.generic.GenericData
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord}
import org.slf4j.LoggerFactory
case class User(name: String, favoriteNumber: Int, favoriteColor: String)
class AvroProducer {
val logger = LoggerFactory.getLogger(getClass)
val kafkaBootstrapServer = sys.env("KAFKA_BOOTSTRAP_SERVER")
val schemaRegistryUrl = sys.env("SCHEMA_REGISTRY_URL")
val props = new Properties()
props.put("bootstrap.servers", kafkaBootstrapServer)
props.put("schema.registry.url", schemaRegistryUrl)
props.put("key.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer")
props.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer")
props.put("acks", "1")
val producer = new KafkaProducer[String, GenericData.Record](props)
val schemaParser = new Parser
val key = "key1"
val valueSchemaJson =
s"""
{
"namespace": "com.avro.junkie",
"type": "record",
"name": "User2",
"fields": [
{"name": "name", "type": "string"},
{"name": "favoriteNumber", "type": "int"},
{"name": "favoriteColor", "type": "string"}
]
}
"""
val valueSchemaAvro = schemaParser.parse(valueSchemaJson)
val avroRecord = new GenericData.Record(valueSchemaAvro)
val mary = new User("Mary", 840, "Green")
avroRecord.put("name", mary.name)
avroRecord.put("favoriteNumber", mary.favoriteNumber)
avroRecord.put("favoriteColor", mary.favoriteColor)
def start = {
try {
val record = new ProducerRecord("users", key, avroRecord)
val ack = producer.send(record).get()
// grabbing the ack and logging for visibility
logger.info(s"${ack.toString} written to partition ${ack.partition.toString}")
}
catch {
case e: Throwable => logger.error(e.getMessage, e)
}
}
}
Problem(s):
I know how to write the whole avro to Array[Byte]:
val writer = new SpecificDatumWriter[GenericData.Record](valueSchemaAvro)
val out = new ByteArrayOutputStream
val encoder = EncoderFactory.get.binaryEncoder(out, null)
writer.write(avroRecord, encoder) // but here I am also writing the schema, right?
encoder.flush
out.close
out.toByteArray
thanks so much
The first code does use the Schema Registry, and computes an ID + replaces the schema in the byte array for you within KafkaAvroSerializer
If you want to bypass the Schema Registry, use ByteArraySerializer
and send the result of out.toByteArray
in the second code block to the producer.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.