简体   繁体   English

Kafka Connect HDFS Sink与Azure Blob存储

[英]Kafka Connect HDFS Sink with Azure Blob Storage

I want to connect to Azure Blob Storage with Kafka HDFS Sink Connector. 我想使用Kafka HDFS Sink连接器连接到Azure Blob存储。 So far I have done: 到目前为止,我已经完成了:

  1. Set kafka-connect properties: 设置kafka-connect属性:

     hdfs.url=wasbs://<my_url> hadoop.conf.dir={hadoop_3_home}/etc/hadoop/ hadoop.home={hadoop_3_home} 
  2. And in core-site.xml added support for wasbs: 并且在core-site.xml添加了对wasbs的支持:

     <property> <name>fs.wasbs.impl</name> <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value> </property> 
  3. Exported HADOOP_CLASSPATH variable, added to PATH 导出的HADOOP_CLASSPATH变量,添加到PATH

But anyway, Hadoop can not find the class - NativeAzureFileSystem : 但是无论如何,Hadoop无法找到该类NativeAzureFileSystem

at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
 at io.confluent.connect.hdfs.storage.StorageFactory.createStorage(StorageFactory.java:29)
 ... 11 more
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azure.NativeAzureFileSystem not found
 at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
 at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)

Could you please help with this issue. 您能帮忙解决这个问题吗? Is it even possible? 可能吗?

my goal is: backup everything from Kafka to Azure BLOB of any data format. 我的目标是:将所有数据从Kafka备份到Azure BLOB。

The HDFS and cloud connectors can't backup "any format". HDFS和云连接器无法备份“任何格式”。 Confluent's Avro is the first class citizen of file formats. Confluent的Avro是文件格式的一流公民。 JSON secondly, but there is not a "plain text" format, from what I've found. 其次是JSON,但根据我发现的内容,没有“纯文本”格式。 I think the HDFS connector does support "byte array" format. 我认为HDFS连接器确实支持“字节数组”格式。

As I mentioned in the comments, in my opinion, a backup of Kafka is different than indefinitely retaining the data to a file system. 正如我在评论中提到的那样,我认为Kafka的备份不同于将数据无限期地保存到文件系统中。 Backing up Kafka-to-Kafka includes use of MirrorMaker. 备份Kafka-to-Kafka包括使用MirrorMaker。

If you want to use any format, Spark, Flink, NiFi, or Streamsets have more flexibility for handling that out of the box 如果您想使用任何格式,Spark,Flink,NiFi或Streamset都具有更大的灵活性,可以直接使用

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM