[英]Configuring connectors for multiple topics on Kafka Connect Distributed Mode
We have producers that are sending the following to Kafka:我们有生产者向 Kafka 发送以下内容:
kafka-connect-elasticsearch
instances function as consumers to ship data from Kafka to Elasticsearch. kafka-connect-elasticsearch
实例作为消费者将数据从 Kafka 传送到 Elasticsearch。 The hello-world Sink configuration for kafka-connect-elasticsearch
might look like this: kafka-connect-elasticsearch
的 hello-world Sink 配置可能如下所示:
# elasticsearch.properties
name=elasticsearch-sink
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
tasks.max=24
topics=syslog,nginx,zeek.broker.log,zeek.capture_loss.log,zeek.conn.log,zeek.dhcp.log,zeek.dns.log,zeek.files.log,zeek.http.log,zeek.known_services.log,zeek.loaded_scripts.log,zeek.notice.log,zeek.ntp.log,zeek.packet_filtering.log,zeek.software.log,zeek.ssh.log,zeek.ssl.log,zeek.status.log,zeek.stderr.log,zeek.stdout.log,zeek.weird.log,zeek.x509.log
topic.creation.enable=true
key.ignore=true
schema.ignore=true
...
And can be invoked with bin/connect-standalone.sh
.并且可以用bin/connect-standalone.sh
调用。 I realized that running or attempting to run tasks.max=24
when work is performed in a single process is not ideal.我意识到在单个进程中执行工作时运行或尝试运行tasks.max=24
并不理想。 I know that using distributed mode would be a better alternative, but am unclear on the performance-optimal way to submit connectors to distributed mode.我知道使用分布式模式会是一个更好的选择,但我不清楚将连接器提交到分布式模式的最佳性能方式。 Namely,即,
elasticsearch.properties
through a single API call?在分布式模式下,我是否还想通过单个 API 调用只提交一个elasticsearch.properties
? Or would it be best to break up multiple .properties
configs + connectors (eg one for syslog, one for nginx, one for zeek.**) and submit them separately?或者最好将多个.properties
配置 + 连接器(例如一个用于 syslog,一个用于 nginx,一个用于 zeek.**)并分别提交?tasks
be equal to the number of topics x number of partitions, but what dictates the number of workers?我知道tasks
数等于主题数 x 分区数,但是什么决定了工作人员的数量?In distributed mode, would I still want to submit just a single elasticsearch.properties through a single API call?在分布式模式下,我是否仍想通过单个 API 调用仅提交单个 elasticsearch.properties?
It'd be a JSON file, but yes.它会是一个 JSON 文件,但是是的。
what dictates the number of workers?什么决定了工人的数量?
Up to you.由你决定。 JVM usage is one factor that you can monitor and scale on JVM 使用情况是您可以监控和扩展的因素之一
Not really any documentation that I am aware of并不是我所知道的任何文件
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.