[英]Pushing avro file to Kafka
I have an existing avro file and I want to push the file data into kafka but it's not working我有一个现有的 avro 文件,我想将文件数据推送到 kafka 但它不起作用
/usr/bin/kafka-console-producer --broker-list test:9092 --topic test < part-m-00000.avro
Thanks谢谢
You need to first download the avro-tools JAR file您需要先下载 avro-tools JAR 文件
Then get the schema from the file然后从文件中获取模式
java -jar avro-tools.jar getschema part-m-00000.avro > schema.avsc
Then install jq
because it'll help in a minute format that schema file然后安装jq
因为它会以一种分钟的格式帮助架构文件
Next, Avro messages in Kafka ideally should not contain the schema for every single record, so it would improve your overall topic throughput and network usage if you installed the Avro Schema Registry from Confluent (or the one from Hortonworks, but I've yet to try it).接下来,理想情况下,Kafka 中的 Avro 消息不应包含每条记录的模式,因此如果您安装了 Confluent(或 Hortonworks 的 Avro Schema Registry,但我还没有安装),它将提高您的整体主题吞吐量和网络使用率尝试一下)。
After that's working, and you have the rest of the Confluent Platform downloaded, there's a script for producing Avro data, but to use it, you need JSON records from the Avro file.在它运行之后,并且您下载了 Confluent Platform 的其余部分,有一个用于生成 Avro 数据的脚本,但是要使用它,您需要来自 Avro 文件的 JSON 记录。 Use avro-tools again to get it再次使用 avro-tools 获取
java -jar avro-tools.jar tojson part-m-00000.avro > records.json
Note - this output file will be significantly larger than the Avro file注意- 此输出文件将明显大于 Avro 文件
Now, you're able to produce using the schema, which will be sent to the registry, and binary avro data into the topic, which is converted from applying the schema onto JSON records现在,您可以使用将被发送到注册中心的模式和二进制 avro 数据生成到主题中,这是从将模式应用于 JSON 记录转换而来的
bin/kafka-avro-console-producer \
--broker-list localhost:9092 --topic test \
--property schema.registry.url=http://localhost:8081 \
--property value.schema="'$(jq -r tostring schema.avsc)'" < records.json
Note : Run jq -r tostring schema.avsc
before this command, make sure it's an not an escaped JSON string注意:在此命令之前运行jq -r tostring schema.avsc
,确保它不是转义的 JSON 字符串
If that is output JSON file is too large, you might also be able to stream the avro-tools output into the producer如果输出 JSON 文件太大,您也可以将 avro-tools 输出流式传输到生产者
Replace代替
< records.json
With和
< $(java -jar avro-tools.jar tojson part-m-00000.avro)
Alternative solutions would include reading the Avro files in Spark, then forwarding those records to Kafka替代解决方案包括读取 Spark 中的 Avro 文件,然后将这些记录转发到 Kafka
If you want to publish Avro Messages, you can try kafka-avro-console-producer.如果你想发布 Avro Messages,你可以试试 kafka-avro-console-producer。
$ ./bin/kafka-avro-console-producer \
--broker-list localhost:9092 --topic test \
--property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"f1","type":"string"}]}' < avrofile.avro
It is part of confluent open source package Please refer the more details here.它是 confluent 开源包的一部分,请参阅此处的更多详细信息。 https://docs.confluent.io/3.0.0/quickstart.html https://docs.confluent.io/3.0.0/quickstart.html
PS Could not find the commands in latest version PS 找不到最新版本的命令
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.