简体   繁体   English

将 avro 文件推送到 Kafka

[英]Pushing avro file to Kafka

I have an existing avro file and I want to push the file data into kafka but it's not working我有一个现有的 avro 文件,我想将文件数据推送到 kafka 但它不起作用

/usr/bin/kafka-console-producer --broker-list test:9092 --topic test < part-m-00000.avro

Thanks谢谢

You need to first download the avro-tools JAR file您需要先下载 avro-tools JAR 文件

Then get the schema from the file然后从文件中获取模式

java -jar avro-tools.jar getschema part-m-00000.avro > schema.avsc

Then install jq because it'll help in a minute format that schema file然后安装jq因为它会以一种分钟的格式帮助架构文件

Next, Avro messages in Kafka ideally should not contain the schema for every single record, so it would improve your overall topic throughput and network usage if you installed the Avro Schema Registry from Confluent (or the one from Hortonworks, but I've yet to try it).接下来,理想情况下,Kafka 中的 Avro 消息不应包含每条记录的模式,因此如果您安装了 Confluent(或 Hortonworks 的 Avro Schema Registry,但我还没有安装),它将提高您的整体主题吞吐量和网络使用率尝试一下)。

After that's working, and you have the rest of the Confluent Platform downloaded, there's a script for producing Avro data, but to use it, you need JSON records from the Avro file.在它运行之后,并且您下载了 Confluent Platform 的其余部分,有一个用于生成 Avro 数据的脚本,但是要使用它,您需要来自 Avro 文件的 JSON 记录。 Use avro-tools again to get it再次使用 avro-tools 获取

java -jar avro-tools.jar tojson part-m-00000.avro > records.json

Note - this output file will be significantly larger than the Avro file注意- 此输出文件将明显大于 Avro 文件

Now, you're able to produce using the schema, which will be sent to the registry, and binary avro data into the topic, which is converted from applying the schema onto JSON records现在,您可以使用将被发送到注册中心的模式和二进制 avro 数据生成到主题中,这是从将模式应用于 JSON 记录转换而来的

bin/kafka-avro-console-producer \
         --broker-list localhost:9092 --topic test \
        --property schema.registry.url=http://localhost:8081 \
         --property value.schema="'$(jq -r tostring schema.avsc)'" < records.json

Note : Run jq -r tostring schema.avsc before this command, make sure it's an not an escaped JSON string注意:在此命令之前运行jq -r tostring schema.avsc ,确保它不是转义的 JSON 字符串


If that is output JSON file is too large, you might also be able to stream the avro-tools output into the producer如果输出 JSON 文件太大,您也可以将 avro-tools 输出流式传输到生产者

Replace代替

< records.json 

With

< $(java -jar avro-tools.jar tojson part-m-00000.avro)

Alternative solutions would include reading the Avro files in Spark, then forwarding those records to Kafka替代解决方案包括读取 Spark 中的 Avro 文件,然后将这些记录转发到 Kafka

If you want to publish Avro Messages, you can try kafka-avro-console-producer.如果你想发布 Avro Messages,你可以试试 kafka-avro-console-producer。

$ ./bin/kafka-avro-console-producer \
             --broker-list localhost:9092 --topic test \
             --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"f1","type":"string"}]}'  < avrofile.avro

It is part of confluent open source package Please refer the more details here.它是 confluent 开源包的一部分,请参阅此处的更多详细信息。 https://docs.confluent.io/3.0.0/quickstart.html https://docs.confluent.io/3.0.0/quickstart.html

PS Could not find the commands in latest version PS 找不到最新版本的命令

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM