简体   繁体   English

在 Kafka Connect 分布式模式下为多个主题配置连接器

[英]Configuring connectors for multiple topics on Kafka Connect Distributed Mode

We have producers that are sending the following to Kafka:我们有生产者向 Kafka 发送以下内容:

  • topic=syslog, ~25,000 events per day topic=syslog,每天约 25,000 个事件
  • topic=nginx, ~5,000 events per day topic=nginx,每天约 5,000 个事件
  • topic=zeek.xxx.log, ~100,000 events per day (total). topic=zeek.xxx.log,每天约 100,000 个事件(总计)。 In this last case there are 20 distinct zeek topics, such as zeek.conn.log and zeek.http.log在最后一种情况下,有 20 个不同的 zeek 主题,例如 zeek.conn.log 和 zeek.http.log

kafka-connect-elasticsearch instances function as consumers to ship data from Kafka to Elasticsearch. kafka-connect-elasticsearch实例作为消费者将数据从 Kafka 传送到 Elasticsearch。 The hello-world Sink configuration for kafka-connect-elasticsearch might look like this: kafka-connect-elasticsearch的 hello-world Sink 配置可能如下所示:

# elasticsearch.properties
name=elasticsearch-sink
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
tasks.max=24
topics=syslog,nginx,zeek.broker.log,zeek.capture_loss.log,zeek.conn.log,zeek.dhcp.log,zeek.dns.log,zeek.files.log,zeek.http.log,zeek.known_services.log,zeek.loaded_scripts.log,zeek.notice.log,zeek.ntp.log,zeek.packet_filtering.log,zeek.software.log,zeek.ssh.log,zeek.ssl.log,zeek.status.log,zeek.stderr.log,zeek.stdout.log,zeek.weird.log,zeek.x509.log
topic.creation.enable=true
key.ignore=true
schema.ignore=true
...

And can be invoked with bin/connect-standalone.sh .并且可以用bin/connect-standalone.sh调用。 I realized that running or attempting to run tasks.max=24 when work is performed in a single process is not ideal.我意识到在单个进程中执行工作时运行或尝试运行tasks.max=24并不理想。 I know that using distributed mode would be a better alternative, but am unclear on the performance-optimal way to submit connectors to distributed mode.我知道使用分布式模式会是一个更好的选择,但我不清楚将连接器提交到分布式模式的最佳性能方式。 Namely,即,

  • In distributed mode, would I still want to submit just a single elasticsearch.properties through a single API call?在分布式模式下,我是否还想通过单个 API 调用只提交一个elasticsearch.properties Or would it be best to break up multiple .properties configs + connectors (eg one for syslog, one for nginx, one for zeek.**) and submit them separately?或者最好将多个.properties配置 + 连接器(例如一个用于 syslog,一个用于 nginx,一个用于 zeek.**)并分别提交?
  • I understand that tasks be equal to the number of topics x number of partitions, but what dictates the number of workers?我知道tasks数等于主题数 x 分区数,但是什么决定了工作人员的数量?
  • Is there anywhere in the documentation that walks through best practices for a situation such as this where there is a noticeable imbalance of throughput for different topics?文档中是否有任何地方介绍了针对不同主题的吞吐量明显不平衡的情况的最佳实践?

In distributed mode, would I still want to submit just a single elasticsearch.properties through a single API call?在分布式模式下,我是否仍想通过单个 API 调用仅提交单个 elasticsearch.properties?

It'd be a JSON file, but yes.它会是一个 JSON 文件,但是是的。

what dictates the number of workers?什么决定了工人的数量?

Up to you.由你决定。 JVM usage is one factor that you can monitor and scale on JVM 使用情况是您可以监控和扩展的因素之一

Not really any documentation that I am aware of并不是我所知道的任何文件

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 具有多个连接器和一个主题的分布式Kafka Connect - Distributed Kafka Connect with multiple Connectors and one Topic Kafka-Connect:在分布式模式下创建新连接器就是创建新组 - Kafka-Connect: Creating a new connector in distributed mode is creating new group kafka-connect-elasticsearch 如何将多个主题路由到同一连接器中的同一 elasticsearch 索引? - kafka-connect-elasticsearch How to route multiple topics to same elasticsearch index in same connector? 使用 fast-data-dev 使用 docker 将 Kafka 主题连接到 elasticsearch - Connect Kafka topics to elasticsearch using fast-data-dev using docker Kafka连接ElasticSearch接收器-使用if-else块提取和转换不同主题的字段 - Kafka connect ElasticSearch sink - using if-else blocks to extract and transform fields for different topics logstash 5.0.1:设置elasticsearch多个索引输出多个kafka输入主题 - logstash 5.0.1: setup elasticsearch multiple indexes ouput for multiple kafka input topics 自动将在Kafka中创建的主题下沉到Elasticsearch - Auto sinking of topics being created in kafka to elasticsearch 卡夫卡连接器 Elasticsearch topic.regex - Kafka connector Elasticsearch topics.regex Kafka connect弹性搜索ID创建多个字段不起作用 - Kafka connect elastic search ID creation for multiple fields not working Kafka Connect Elasticsearch-NoSuchMethodError - Kafka Connect Elasticsearch - NoSuchMethodError
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM