简体繁体中英

Flink cluster on EKS

原文 2020-01-08 16:33:39 8 1 java/ amazon-web-services/ kubernetes/ apache-flink/ amazon-eks

I am new to Flink and kubernetes. I am planning to creating a flink streaming job that streams data from a FileSystem to Kafka.

Have the flink job jar which is working fine(tested locally). Now I am trying to host this job in kubernetes, and would like to use EKS in AWS.

I have read through official flink documentation on how to setup flink cluster. https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/kubernetes.html

I tried to set it up locally using minikube and brought up session cluster and submitted the job which is working fine.

My questions: 1)Out of the two options Job cluster and session cluster, since the job is streaming job and should keep monitor the filesystem and when any new files came in it should stream it to destination, can I use job cluster in this case? As per documentation job cluster is something that executes the job and terminates once it is completed, if the job has monitor on a folder does it ever complete?

2)I have a maven project that builds the flink jar, would like to know the ideal way to spin a session/job cluster using this jar in production ? what is the normal CI CD process ? Shall I build a session cluster initially and submit the jobs whenever needed ? or spinning up Job cluster with the jar built ?

1 answers

First off, the link that you provided is for Flink 1.5. If you are starting fresh, I'd recommend using Flink 1.9 or the upcoming 1.10.

For your questions:

1) A job with file monitor never terminates. It cannot know when no more files arrive, so you have to cancel it manually. Job cluster is fine for that.

2) There is no clear answer to that and it's also not Flink specific. Everyone has a different solution with different drawbacks.

I'd aim for a semi-automatic approach, where everything is automatic but you need to explicitly press a deploy button (and not just a git push). Often times, these CI/CD pipelines deploy on a test cluster first and make a smoke test before allowing a deploy on production.

If you are completely fresh, you could check the AWS codedeploy . However, I made good experiences with Gitlab and AWS runner.

The normal process would be something like:

build
integration/e2e tests on build machine (dockerized)
deploy on test cluster/preprod cluster
run smoke tests
deploy on prod

I have also seen processes that go quickly on prod and invest the time in better monitoring and a fast rollback instead of preprod cluster and smoke tests. That's usually viable for business uncritical processes and how expensive a reprocessing is.

flink - cluster not using cluster

Flink: Jar file execution on Flink cluster

Flink: Wrap executable non-flink jar to run it in a flink cluster

Apache Flink throws UnknownHostException on cluster

flink cluster params - how to set

Apache Flink (Error in stdout in cluster)

Creating EKS cluster by using Java application

Apache Flink: Standalone Cluster tries to connect with username “flink”

Flink: Cluster Execution error of loss of Taskmanager

Kafka jaas verify failed on Flink cluster

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question flink - cluster not using cluster Flink: Jar file execution on Flink cluster Flink: Wrap executable non-flink jar to run it in a flink cluster Apache Flink throws UnknownHostException on cluster flink cluster params - how to set Apache Flink (Error in stdout in cluster) Creating EKS cluster by using Java application Apache Flink: Standalone Cluster tries to connect with username “flink” Flink: Cluster Execution error of loss of Taskmanager Kafka jaas verify failed on Flink cluster

Related Tags

Flink cluster on EKS

Question

1 answers

solution1 1 ACCPTED 2020-01-09 15:01:14

solution1
1 ACCPTED 2020-01-09 15:01:14