简体   繁体   English

org.apache.kafka.common.errors.TimeoutException:使用 jaas SASL 配置身份验证获取 Kafka 集群的主题元数据时超时

[英]org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata for Kafka Cluster using jaas SASL config authentication

I am trying to deploy a Google Cloud Dataflow pipeline which reads from a Kafka cluster, processes its records, and then writes the results to BigQuery.我正在尝试部署一个 Google Cloud Dataflow 管道,该管道从 Kafka 集群中读取数据,处理其记录,然后将结果写入 BigQuery。 However, I keep encountering the following exception when attempting to deploy:但是,我在尝试部署时不断遇到以下异常:

org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata for Kafka Cluster

The Kafka cluster requires the use of a JAAS configuration for authentication, and I use the code below to set the properties required for the KafkaIO.read Apache Beam method: Kafka 集群需要使用 JAAS 配置进行身份验证,我使用下面的代码设置 KafkaIO.read Apache Beam 方法所需的属性:

// Kafka properties
    Map<String, Object> kafkaProperties = new HashMap<String, Object>(){{
        put("request.timeout.ms", 900000);
        put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT");
        put(SaslConfigs.SASL_MECHANISM, "SCRAM-SHA-512");
        put(SaslConfigs.SASL_JAAS_CONFIG, "org.apache.kafka.common.security.scram.ScramLoginModule required username=\"USERNAME\" password=\"PASSWORD\";");
        put(CommonClientConfigs.GROUP_ID_CONFIG, GROUP_ID);
    }};

    // Build & execute pipeline
    pipeline
            .apply(
                    "ReadFromKafka",
                    KafkaIO.<Long, String>read()
                            .withBootstrapServers(properties.getProperty("kafka.servers"))
                            .withKeyDeserializer(LongDeserializer.class)
                            .withValueDeserializer(StringDeserializer.class)
                            .withTopic(properties.getProperty("kafka.topic")).withConsumerConfigUpdates(kafkaProperties))

The Dataflow pipeline is to be deployed with public IPs disabled, but there is an established VPN tunnel from our Google Cloud VPC network to the Kafka cluster and the required routing for the private ips on both sides are configured and their IPs are whitelisted. Dataflow 管道将在禁用公共 IP 的情况下部署,但从我们的 Google Cloud VPC 网络到 Kafka 集群已建立 VPN 隧道,并且配置了双方私有 ip 所需的路由,并将它们的 IP 列入白名单。 I am able to ping and connect to the socket of the Kafka server using a Compute Engine VM in the same VPN subnetwork as the Dataflow job to be deployed.我能够使用与要部署的 Dataflow 作业位于同一 VPN 子网中的 Compute Engine 虚拟机 ping 并连接到 Kafka 服务器的套接字。

I was thinking that there is an issue with the configuration, but I am not able to figure out if I am missing an additional field, or if one of the existing ones is misconfigured.我当时认为配置存在问题,但我无法确定我是否缺少其他字段,或者现有字段之一是否配置错误。 Does anyone know how I can diagnose the problem further since the exception thrown does not really pinpoint the issue?有谁知道我如何进一步诊断问题,因为抛出的异常并没有真正确定问题? Any help would be greatly appreciated.任何帮助将不胜感激。

Edit: I am now able to successfully deploy the Dataflow job now, however it appears as though the read is not functioning correctly.编辑:我现在能够成功部署 Dataflow 作业,但是看起来好像读取功能不正常。 After viewing the logs to check for the errors in the Dataflow job, I can see that after discovering the group coordinator for the kafka topic, there are no other log statements before a warning log statement saying that the closing of the idle reader timed out:在查看日志检查Dataflow作业中的错误后,我可以看到在发现kafka主题的组协调器后,在警告日志语句之前没有其他日志语句说关闭空闲阅读器超时:

Close timed out with 1 pending requests to coordinator, terminating client connections

followed by an uncaught exception with the root cause being:随后是一个未捕获的异常,其根本原因是:

org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before the position for partition test-partition could be determined

There is then an error stating:然后有一个错误说明:

Execution of work for P0 for key 0000000000000001 failed. Will retry locally.

Could this maybe be an issue with the key definition since the Kafka topics actually do not have keys in the messages?由于 Kafka 主题实际上在消息中没有键,这可能是键定义的问题吗? When I view the topic in Kafka Tool, the only columns observed in the data consist of Offset, Message, and a Timestamp.当我在 Kafka Tool 中查看主题时,在数据中观察到的唯一列包括 Offset、Message 和 Timestamp。

Based on the last comment, I assume that you're experiencing the issue more with network stack then initially seeking for any configuration lacks in Dataflow pipeline, in terms of performing Dataflow job runners connections to Kafka brokers.根据最后一条评论,我假设您在执行 Dataflow 作业运行程序与 Kafka 代理的连接方面遇到的问题更多的是网络堆栈,然后是最初寻找 Dataflow 管道中缺少的任何配置。

Basically, when you use Public IP addresses pool for Dataflow workers you have a simplest way to reach external Kafka cluster with no extra configuration to apply on both sides, as you don't need to launch VPC network between parties and perform routine network job to get all routes work.基本上,当您为 Dataflow 工作人员使用公共 IP地址池时,您有一种最简单的方式来访问外部 Kafka 集群,而无需在双方之间应用额外的配置,因为您不需要在各方之间启动VPC 网络并执行例行网络作业以让所有路线工作。

However, Cloud VPN brings some more complications implementing VPC network on both sides and further adjusting VPN gateway, forwarding rules, and addressing pool for this VPC.但是, Cloud VPN带来了更多的复杂性,在双方实现 VPC 网络以及进一步调整此 VPC 的 VPN 网关、转发规则和地址池。 Instead, from Dataflow runtime perspective you don't need to spread Public IP addresses between Dataflow runners and doubtlessly reduce the price.相反,从 Dataflow 运行时的角度来看,您不需要在 Dataflow 运行器之间传播公共 IP 地址,并且无疑会降低价格。

The problem that you've mentioned primary lays on Kafka cluster side.您提到的主要问题在于 Kafka 集群方面。 Due to the fact that Apache Kafka is a distributed system, it has the core principle: When producer/consumer executes, it will request metadata about which broker is the leader for a partition, receiving metadata with endpoints available for that partition,thus the client then acknowledge those endpoints to connect to the particular broker.由于Apache Kafka是一个分布式系统,它的核心原理是:当生产者/消费者执行时,它会请求关于哪个 broker 是分区领导者的元数据,接收具有该分区可用端点的元数据,从而客户端然后确认这些端点连接到特定的代理。 And as far as I understand in your case, the connection to leader is performing through the listener bounded to the external network interface, configured in server.properties broker setting .据我了解,在您的情况下,与领导者的连接是通过绑定到外部网络接口的侦听器执行的,在server.properties代理设置中配置。

Therefore, you might consider to create a separate listener (if it doesn't exist) in listeners bounded to cloud VPC network interface and if necessary propagate advertised.listeners with metadata that is going back to client, consisting data for connection to the particular broker.因此,您可以考虑在绑定到云 VPC 网络接口的listeners器中创建一个单独的侦听器(如果它不存在),并在必要时传播带有返回客户端的元数据的advertised.listeners ,包括用于连接到特定代理的数据.

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 org.apache.kafka.common.errors.TimeoutException:60000 毫秒后元数据中不存在主题 - org.apache.kafka.common.errors.TimeoutException: Topic not present in metadata after 60000 ms Azure eventthub Kafka org.apache.kafka.common.errors.TimeoutException 对于一些记录 - Azure eventhub Kafka org.apache.kafka.common.errors.TimeoutException for some of the records 获取间歇性 KafkaProducerException:无法发送 org.apache.kafka.common.errors.TimeoutException - Getting intermittent KafkaProducerException: Failed to send org.apache.kafka.common.errors.TimeoutException Docker swarm:org.apache.kafka.common.errors.TimeoutException:等待节点分配超时 - Docker swarm : org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment 由于“获取主题元数据时超时”,使用 Kafka 失败 - Consuming Kafka fails due to "Timeout expired while fetching topic metadata" 使用 Jaas 配置进行 Kafka 身份验证 - Kafka authentication with Jaas config org.apache.kafka.common.errors.UnknownTopicOrPartitionException:此服务器未托管此主题分区 - org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition 将 sasl.jaas.config 添加到 payara 上的 Kafka MDB - Adding sasl.jaas.config to Kafka MDB on payara Java Kafka 客户端动态 Sasl Jaas 配置更新 - Java Kafka Client Dynamic Sasl Jaas config update Apache Kafka主题元数据提取包含错误:{LEADER_NOT_AVAILABLE} - Apache Kafka Topic metadata fetch included errors: {LEADER_NOT_AVAILABLE}
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM