简体繁体 English

EKS 上的 Flink 集群

[英]Flink cluster on EKS

原文 2020-01-08 16:33:39 1 1 java/ amazon-web-services/ kubernetes/ apache-flink/ amazon-eks

I am new to Flink and kubernetes.我是 Flink 和 kubernetes 的新手。 I am planning to creating a flink streaming job that streams data from a FileSystem to Kafka.我计划创建一个 flink 流作业，将数据从 FileSystem 流式传输到 Kafka。

Have the flink job jar which is working fine(tested locally).有工作正常的 flink 作业 jar（在本地测试）。 Now I am trying to host this job in kubernetes, and would like to use EKS in AWS.现在我正在尝试在 kubernetes 中托管这项工作，并希望在 AWS 中使用 EKS。

I have read through official flink documentation on how to setup flink cluster.我已经阅读了关于如何设置 flink 集群的官方 flink 文档。 https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/kubernetes.html https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/kubernetes.html

I tried to set it up locally using minikube and brought up session cluster and submitted the job which is working fine.我尝试使用 minikube 在本地设置它并启动会话集群并提交工作正常的作业。

My questions: 1)Out of the two options Job cluster and session cluster, since the job is streaming job and should keep monitor the filesystem and when any new files came in it should stream it to destination, can I use job cluster in this case?我的问题：1）在作业集群和会话集群这两个选项中，由于作业是流式作业并且应该保持监视文件系统，并且当任何新文件进来时应该将其流式传输到目的地，在这种情况下我可以使用作业集群吗? As per documentation job cluster is something that executes the job and terminates once it is completed, if the job has monitor on a folder does it ever complete?根据文档，作业集群是执行作业并在完成后终止的东西，如果作业在文件夹上有监视器，它会完成吗？

2)I have a maven project that builds the flink jar, would like to know the ideal way to spin a session/job cluster using this jar in production ? 2）我有一个构建 flink jar 的 Maven 项目，想知道在生产中使用这个 jar 旋转会话/作业集群的理想方法吗？ what is the normal CI CD process ?什么是正常的 CI CD 流程？ Shall I build a session cluster initially and submit the jobs whenever needed ?我应该首先构建一个会话集群并在需要时提交作业吗？ or spinning up Job cluster with the jar built ?或使用构建的 jar 启动作业集群？

1 个解决方案

First off, the link that you provided is for Flink 1.5.首先，您提供的链接适用于 Flink 1.5。 If you are starting fresh, I'd recommend using Flink 1.9 or the upcoming 1.10.如果您是新手，我建议您使用 Flink 1.9 或即将推出的 1.10。

For your questions:对于您的问题：

1) A job with file monitor never terminates. 1) 带有文件监视器的作业永远不会终止。 It cannot know when no more files arrive, so you have to cancel it manually.它无法知道何时没有更多文件到达，因此您必须手动取消它。 Job cluster is fine for that.作业集群对此很好。

2) There is no clear answer to that and it's also not Flink specific. 2）对此没有明确的答案，也不是 Flink 特有的。 Everyone has a different solution with different drawbacks.每个人都有不同的解决方案，有不同的缺点。

I'd aim for a semi-automatic approach, where everything is automatic but you need to explicitly press a deploy button (and not just a git push).我的目标是采用半自动方法，其中一切都是自动的，但您需要明确按下部署按钮（而不仅仅是 git push）。 Often times, these CI/CD pipelines deploy on a test cluster first and make a smoke test before allowing a deploy on production.通常，这些 CI/CD 管道首先部署在测试集群上，并在允许部署到生产之前进行冒烟测试。

If you are completely fresh, you could check the AWS codedeploy .如果你完全新鲜，你可以检查AWS codedeploy 。 However, I made good experiences with Gitlab and AWS runner.但是，我在 Gitlab 和 AWS runner 方面取得了很好的经验。

The normal process would be something like:正常的过程是这样的：

build建造
integration/e2e tests on build machine (dockerized)构建机器上的集成/e2e 测试（dockerized）
deploy on test cluster/preprod cluster部署在测试集群/预生产集群上
run smoke tests运行冒烟测试
deploy on prod在产品上部署

I have also seen processes that go quickly on prod and invest the time in better monitoring and a fast rollback instead of preprod cluster and smoke tests.我还看到了在 prod 上快速运行的过程，并将时间投入到更好的监控和快速回滚上，而不是 preprod 集群和冒烟测试。 That's usually viable for business uncritical processes and how expensive a reprocessing is.这对于业务不重要的流程以及再处理的成本通常是可行的。