简体   繁体   English

读取现有的Avro文件并发送到Kafka

[英]Read Existing Avro File and Send to Kafka

I have an existing Avro File with the schema. 我有一个具有该模式的现有Avro文件。 I need to send the file to Producer. 我需要将文件发送给Producer。

Following is the code i have written. 以下是我编写的代码。

public class ProducerDataSample {

    public static void main(String[] args) {

        String topic = "my-topic";

        Schema.Parser parser = new Schema.Parser();
        Schema schema = parser.parse(AvroSchemaDefinitionLoader.fromFile("encounter.avsc").get());

            File file = new File("/home/hello.avro");
        try{
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema);
        DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(datumWriter);
        dataFileWriter.create(schema, outputStream);
        dataFileWriter.appendTo(file);
        dataFileWriter.close();
        System.out.println("Here comes the data: " + outputStream);



        // Start KAFKA publishing

        Properties props = new Properties();
        props.put("bootstrap.servers", "localhost:9092");
        props.put("serializer.class", "kafka.serializer.StringEncoder");
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer");

        KafkaProducer<String, byte[]> messageProducer = new KafkaProducer<String, byte[]>(props);
        ProducerRecord<String, byte[]> producerRecord = null;
        producerRecord = new ProducerRecord<String, byte[]>("m-topic","1",outputStream.toByteArray());
        messageProducer.send(producerRecord);
        messageProducer.close();
        }catch(Exception e){
            System.out.println("Error in sending to kafka");
            e.printStackTrace();
        }





    }
}

As soon as I execute this, I get the error: 一旦执行此命令,就会收到错误消息:

Error in sending to kafka org.apache.avro.AvroRuntimeException: already open
at org.apache.avro.file.DataFileWriter.assertNotOpen(DataFileWriter.java:85) 
at org.apache.avro.file.DataFileWriter.appendTo(DataFileWriter.java:203)
at org.apache.avro.file.DataFileWriter.appendTo(DataFileWriter.java:193)
at ProducerDataSample.main(ProducerDataSample.java:51)

Any help. 任何帮助。 Thanks. 谢谢。

You will have to read the data from the avro file and serialize it to bytes array 您将不得不从avro文件中读取数据并将其序列化为bytes数组

Something like below snippet 如下代码片段所示

        final Schema schema = new Schema.Parser().parse(new File("sample.avsc"));            
        File file ="sample.avro"

        //read the avro file to GenericRecord
        final GenericDatumReader<GenericRecord> genericDatumReader = new GenericDatumReader<>(schema);
        final DataFileReader<GenericRecord> genericRecords = new DataFileReader<>(file, genericDatumReader);

        //serialize GenericRecords
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);

        Encoder binaryEncoder = EncoderFactory.get().binaryEncoder(out, null);

        while (genericRecords.hasNext()) {
            writer.write(genericRecords.next(), binaryEncoder);
        }
        binaryEncoder.flush();
        out.close();
       //send out.toByteArray() to kakfa

I think the other answer should look like this to send individual records as Kafka events. 我认为其他答案应该像这样,将个人记录作为Kafka事件发送。

Note: It should be possible to get the schema directly from the Avro file rather than have a separate AVSC file. 注意:应该可以直接从Avro文件中获取架构,而不要拥有单独的AVSC文件。 Following code in the Avro project 在Avro项目中遵循以下代码

final Schema schema = new Schema.Parser().parse(new File("sample.avsc"));            
File file = new File("sample.avro");

//read the avro file to GenericRecord
final GenericDatumReader<GenericRecord> genericDatumReader = new GenericDatumReader<>(schema);
final DataFileReader<GenericRecord> genericRecords = new DataFileReader<>(file, genericDatumReader);

DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);

while (genericRecords.hasNext()) {
    //serialize GenericRecords
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    Encoder binaryEncoder = EncoderFactory.get().binaryEncoder(out, null);
    writer.write(genericRecords.next(), binaryEncoder);
    binaryEncoder.flush();
    out.close();

    // TODO: send out.toByteArray() to kafka
}

// TODO: kafkaProducer.flush();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM