简体   繁体   中英

Avro and Kafka by making use of SchemaBuilder

I went through the tutorial from baeldung . They mention there are two ways to create a schema.

  • By writing the json representation and adding the maven plugin to produce the class
  • By using the SchemaBuilder , which they also mention is a better choice.

Unfortunately in the git example I only see the json way.

Lets say I have this Avro schema:

{
  "type":"record",
  "name":"TestFile",
  "namespace":"com.example.kafka.data.ingestion.model",
  "fields":[
    {
      "name":"date",
      "type":"long"
    },
    {
      "name":"counter",
      "type":"int"
    },
    {
      "name":"mc",
      "type":"string"
    }
  ]
}

By adding this plugin in my pom file:

<plugin>
   <groupId>org.apache.avro</groupId>
   <artifactId>avro-maven-plugin</artifactId>
   <version>1.8.0</version>
   <executions>
      <execution>
         <id>schemas</id>
         <phase>generate-sources</phase>
         <goals>
            <goal>schema</goal>
            <goal>protocol</goal>
            <goal>idl-protocol</goal>
         </goals>
         <configuration>
            <sourceDirectory>${project.basedir}/src/main/resources/</sourceDirectory>
            <outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
         </configuration>
      </execution>
   </executions>
</plugin>

and building with generate-sources a TestFile.java is created to the destination I said. Then for sending to a kafka topic I can do the following:

TestFile test = TestFile.newBuilder()
                                            .setDate(102928374747)
                                            .setCounter(2)
                                            .setMc("Some string")
                                            .build();
kafkaTemplate.send(topicName, test);

The equivalent of creating the schema with SchemaBuilder would be:

Schema testFileSchema = SchemaBuilder   .record("TestFile")
                                            .namespace("com.example.kafka.data.ingestion.model")
                                            .fields()
                                            .requiredLong("date")
                                            .requiredInt("counter")
                                            .requiredString("mc")
                                            .endRecord();

But how can I now generate the POJO and send my TestFile data to my kafka topic?

You won't have access to a TestFile object since the Schema is made at runtime, not pre-compiled. If you want to keep that POJO, then you would need a constructor for public TestFile(GenericRecord avroRecord)

You'll need to create a GenericRecord using that Schema object, same as if you were parsing it from a String or a file.

For example,

Schema schema = SchemaBuilder.record("TestFile")
            .namespace("com.example.kafka.data.ingestion.model")
            .fields()
            .requiredLong("date")
            .requiredInt("counter")
            .requiredString("mc")
            .endRecord();

GenericRecord entry1 = new GenericData.Record(schema);
entry1.put("date", 1L);
entry1.put("counter", 2);
entry1.put("mc", "3");

// producer.send(new ProducerRecord<>(topic, entry1);

A full Kafka example is available from Confluent

If you put don't include a required field, it'll throw an error, and the values of the types are not checked (I could put "counter", "2" , and it would send a string value (this seems to be a bug to me). Basically, GenericRecord == HashMap<String, Object> with the added benefit of reqiured/nullable fields.

And you will need to configure an Avro serializer, such as Confluent's, which requires running their Schema Registry, or a version like Cloudera shows

Otherwise, you need to convert the Avro object into a byte[] (as shown in your linkand just use the ByteArraySerializer

As stated in the Baeldung tutorial:

Later we can apply the toString method to get the JSON structure of Schema.

So for example using this code inside a main class you can print the two schemas definition to the console output.

You can then save the resulting json representations to .avsc file and generate pojos as before.

    Schema clientIdentifier = SchemaBuilder.record("ClientIdentifier")
            .namespace("com.baeldung.avro")
            .fields().requiredString("hostName").requiredString("ipAddress")
            .endRecord();
    System.out.println(clientIdentifier.toString());

    Schema avroHttpRequest = SchemaBuilder.record("AvroHttpRequest")
            .namespace("com.baeldung.avro")
            .fields().requiredLong("requestTime")
            .name("clientIdentifier")
            .type(clientIdentifier)
            .noDefault()
            .name("employeeNames")
            .type()
            .array()
            .items()
            .stringType()
            .arrayDefault(new ArrayList<>())
            .name("active")
            .type()
            .enumeration("Active")
            .symbols("YES","NO")
            .noDefault()
            .endRecord();
    System.out.println(avroHttpRequest.toString());

There is a third way to generate Avro schemas that is using Avro IDL

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM