简体   繁体   English

如何将 Apache Kafka 与 Amazon S3 连接?

[英]How to connect Apache Kafka with Amazon S3?

I want to store data from Kafka into a bucket s3 using Kafka Connect.我想使用 Kafka Connect 将数据从 Kafka 存储到存储桶 s3 中。 I had already a Kafka's topic running and I had a bucket s3 created.我已经运行了 Kafka 的主题,并且创建了一个存储桶 s3。 My topic has data on Protobuffer, I tried with https://github.com/qubole/streamx and I obtained the next error:我的主题有关于 Protobuffer 的数据,我尝试使用https://github.com/qubole/streamx并获得下一个错误:

 [2018-10-04 13:35:46,512] INFO Revoking previously assigned partitions [] for group connect-s3-sink (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:280)
 [2018-10-04 13:35:46,512] INFO (Re-)joining group connect-s3-sink (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:326)
 [2018-10-04 13:35:46,645] INFO Successfully joined group connect-s3-sink with generation 1 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:434)
 [2018-10-04 13:35:46,692] INFO Setting newly assigned partitions [ssp.impressions-11, ssp.impressions-10, ssp.impressions-7, ssp.impressions-6, ssp.impressions-9, ssp.impressions-8, ssp.impressions-3, ssp.impressions-2, ssp.impressions-5, ssp.impressions-4, ssp.impressions-1, ssp.impressions-0] for Group connect-s3-sink(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:219)
 [2018-10-04 13:35:47,193] ERROR Task s3-sink-0 threw an uncaught an unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:142)
 java.lang.NullPointerException
    at io.confluent.connect.hdfs.HdfsSinkTask.close(HdfsSinkTask.java:122)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets(WorkerSinkTask.java:290)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.closePartitions(WorkerSinkTask.java:421)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:146)
    at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
    at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
[2018-10-04 13:35:47,194] ERROR Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:143)
[2018-10-04 13:35:51,235] INFO Reflections took 6844 ms to scan 259 urls, producing 13517 keys and 95788 values (org.reflections.Reflections:229)

I did the next steps:我做了以下步骤:

  1. I cloned the repository.我克隆了存储库。
  2. mvn DskipTests package
  3. nano config/connect-standalone.properties

     bootstrap.servers=ip-myip.ec2.internal:9092 key.converter=com.qubole.streamx.ByteArrayConverter value.converter=com.qubole.streamx.ByteArrayConverter
  4. nano config/quickstart-s3.properties

     name=s3-sink connector.class=com.qubole.streamx.s3.S3SinkConnector format.class=com.qubole.streamx.SourceFormat tasks.max=1 topics=ssp.impressions flush.size=3 s3.url=s3://myaccess_key:mysecret_key@mybucket/demo
  5. connect-standalone /etc/kafka/connect-standalone.properties quickstart-s3.properties

I would like to know if that I did is okay or another way to keep data into S3 from Kafka.我想知道我这样做是否可以,或者是否可以通过其他方式将数据从 Kafka 保存到 S3 中。

You can use Kafka Connect to do this integration, with the Kafka Connect S3 connector.您可以使用 Kafka Connect 与 Kafka Connect S3 连接器进行此集成。

Kafka Connect is part of Apache Kafka, and the S3 connector is an open-source connector available either standalone or as part of Confluent Platform . Kafka Connect是 Apache Kafka 的一部分, S3 连接器是一个开源连接器,可以独立使用,也可以作为Confluent Platform 的一部分使用。

For general information and examples of Kafka Connect, this series of articles might help:有关 Kafka Connect 的一般信息和示例,本系列文章可能会有所帮助:

Disclaimer: I work for Confluent, and wrote the above blog articles.免责声明:我为 Confluent 工作,并撰写了上述博客文章。


April 2020 : I have recorded a video showing how to use the S3 sink: https://rmoff.dev/kafka-s3-video 2020 年 4 月:我录制了一个视频,展示了如何使用 S3 接收器: https : //rmoff.dev/kafka-s3-video

另一种方法是使用日志轮换写入消费者,然后将玉米文件写入 S3 。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM