简体   繁体   English

flink k8s ha 出现错误。 作业 000000000000000000000000000000000 不在 state RUNNING 中,而是在 SCHEDULED 中。 中止检查点

[英]I got an error for flink k8s ha. job 00000000000000000000000000000000 is not in state RUNNING but SCHEDULED instead. Aborting checkpoint

When I apply flink job to k8s zookeeper ha, I get below error.当我将 flink 作业应用于 k8s zookeeper ha 时,出现以下错误。

Our structure is job cluster.我们的结构是工作集群。 1 job and 1 task. 1 个工作和 1 个任务。 We want to implement while we delete job pod the task still can continue work.我们希望在删除作业 pod 的同时实现任务仍然可以继续工作。

 job 00000000000000000000000000000000 is not in state RUNNING but SCHEDULED instead. Aborting checkpoint

below is my conf下面是我的conf

high-availability: zookeeper
high-availability.storageDir: file:///opt/flink/data/
high-availability.zookeeper.quorum: zk-0.zk-hs:2181,zk-1.zk-hs:2181,zk-2.zk-hs:2181
high-availability.zookeeper.client.acl: open
high-availability.zookeeper.path.root: /flinkha
high-availability.cluster-id: /flink-job-service-kpi-ofcwy

below is error log:以下是错误日志:

 2020-06-19 12:56:02,254 INFO org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Recovering checkpoints from ZooKeeper. 2020-06-19 12:56:02,293 INFO org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Found 0 checkpoints in ZooKeeper. 2020-06-19 12:56:02,293 INFO org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Trying to fetch 0 checkpoints from storage. 2020-06-19 12:56:02,312 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/00000000000000000000000000000000/job_manager_lock'}. 2020-06-19 12:56:02,454 INFO org.apache.flink.runtime.jobmaster.JobManagerRunner - JobManager runner for job KPI service job (00000000000000000000000000000000) was granted leadership with session id 9644799b-29cf-4ec5-9e68-5e45261aefb2 at akka.tcp://flink@flink-job-service-kpi-ofcwy:35817/user/jobmanager_0. 2020-06-19 12:56:02,532 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock. 2020-06-19 12:56:02,534 INFO org.apache.flink.runtime.jobmaster.JobMaster - Starting execution of job KPI service job (00000000000000000000000000000000) under job master id 9e685e45261aefb29644799b29cf4ec5. 2020-06-19 12:56:02,552 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job KPI service job (00000000000000000000000000000000) switched from state CREATED to RUNNING. 2020-06-19 12:56:02,575 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: KPI-Kafka-Consumer -> (Sink: Print to Std. Out, Filter -> KPI Query Map -> KPI Unwind -> KPI Custom Map -> KPI filter -> KPI Data Transformation -> Filter) (1/1) (6aeaf74d5a4ee58579e79fa1d3026535) switched from CREATED to SCHEDULED. 2020-06-19 12:56:02,618 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{4abf5ce93cd365168228b616bd80ed71}] 2020-06-19 12:56:02,634 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Process -> Flat Map (1/1) (4ac2344f71fb9b6beb4a42fe18cf77a2) switched from CREATED to SCHEDULED. 2020-06-19 12:56:02,636 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Window(TumblingProcessingTimeWindows(60000), ProcessingTimeTrigger, DistinctCountAggregateFunction, PassThroughWindowFunction) -> Map (1/1) (1fbb13647621f5e48db6f7d750c32865) switched from CREATED to SCHEDULED. 2020-06-19 12:56:02,636 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> (Sink: Unnamed, Sink: Print to Std. Out) (1/1) (46396671fce9498171d03a31b1cee968) switched from CREATED to SCHEDULED. 2020-06-19 12:56:02,655 INFO org.apache.flink.runtime.jobmaster.JobMaster - Connecting to ResourceManager akka.tcp://flink@flink-job-service-kpi-ofcwy:35817/user/resourcemanager(82039211570997fc83bd52bafb394879) 2020-06-19 12:56:02,674 INFO org.apache.flink.runtime.jobmaster.JobMaster - Resolved ResourceManager address, beginning registration 2020-06-19 12:56:02,677 INFO org.apache.flink.runtime.jobmaster.JobMaster - Registration at ResourceManager attempt 1 (timeout=100ms) 2020-06-19 12:56:02,692 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/00000000000000000000000000000000/job_manager_lock. 2020-06-19 12:56:02,693 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Registering job manager 9e685e45261aefb29644799b29cf4ec5@akka.tcp://flink@flink-job-service-kpi-ofcwy:35817/user/jobmanager_0 for job 00000000000000000000000000000000. 2020-06-19 12:56:02,753 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Registered job manager 9e685e45261aefb29644799b29cf4ec5@akka.tcp://flink@flink-job-service-kpi-ofcwy:35817/user/jobmanager_0 for job 00000000000000000000000000000000. 2020-06-19 12:56:02,775 INFO org.apache.flink.runtime.jobmaster.JobMaster - JobManager successfully registered at ResourceManager, leader id: 82039211570997fc83bd52bafb394879. 2020-06-19 12:56:02,775 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl - Requesting new slot [SlotRequestId{4abf5ce93cd365168228b616bd80ed71}] and profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} from resource manager. 2020-06-19 12:56:02,777 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Request slot with profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} for job 00000000000000000000000000000000 with allocation id dcc3d3f3537cd3f1032fe47a0aafe577. 2020-06-19 12:56:40,983 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: KPI-Kafka-Consumer -> (Sink: Print to Std. Out, Filter -> KPI Query Map -> KPI Unwind -> KPI Custom Map -> KPI filter -> KPI Data Transformation -> Filter) (1/1) of job 00000000000000000000000000000000 is not in state RUNNING but SCHEDULED instead. Aborting checkpoint. 2020-06-19 12:57:40,982 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: KPI-Kafka-Consumer -> (Sink: Print to Std. Out, Filter -> KPI Query Map -> KPI Unwind -> KPI Custom Map -> KPI filter -> KPI Data Transformation -> Filter) (1/1) of job 00000000000000000000000000000000 is not in state RUNNING but SCHEDULED instead. Aborting checkpoint.

solved it by config service.通过配置服务解决了它。 missing below configutaion.缺少以下配置。

high-availability.jobmanager.port: 6070

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM