简体   繁体   English

Kubernetes上的可伸缩Spring Batch作业

[英]Scalable spring batch job on kubernetes

I am developing an ETL batch application using spring batch. 我正在使用Spring Batch开发ETL批处理应用程序。 My ETL process takes data from one pagination based REST API and loads it to the Google Big-query. 我的ETL流程从一个基于分页的REST API中获取数据,并将其加载到Google Big查询中。 I would like to deploy this batch application in kubernetes cluster and want to exploit pod scalability feature. 我想在kubernetes集群中部署此批处理应用程序,并想利用pod的可伸缩性功能。 I understand spring batch supports both horizontal and vertical scaling. 我知道Spring Batch支持水平和垂直缩放。 I have few questions:- 我有几个问题:

1) How to deploy this ETL app on kubernetes so that it creates pod on demand using remote chunking / remote partitioning? 1)如何在kubernetes上部署此ETL应用程序,以便它使用远程分块/远程分区按需创建Pod?

2) I am assuming there would be main master pod and different slave pods provisioned based on load. 2)我假设将基于负载配置主主Pod和不同的从Pod。 Is it correct? 这是对的吗?

3) There is one kubernetes batch API also available. 3)还有一个kubernetes批处理API。 Use kubernetes batch API or use Spring Cloud feature.Whis option is the better one? 使用kubernetes批处理API或使用Spring Cloud功能。哪个选项更好?

I have used Spring Boot with Spring Batch and Spring Cloud Task to do something similar to what you want to do. 我已经将Spring Boot与Spring Batch和Spring Cloud Task一起使用来做与您想做的事情类似的事情。 Maybe it will help you. 也许会对您有帮助。

The way it works is like this: I have a manager app that deploys pods on Kubernetes with my master application. 它的工作方式是这样的:我有一个管理器应用程序,它与我的主应用程序一起在Kubernetes上部署了pod。 The master application does some work and then starts the remote partitioning deploying several other pods with "workers". 主应用程序完成一些工作,然后启动远程分区,并使用“工作人员”部署其他几个Pod。

Trying to answer your questions: 尝试回答您的问题:

1) You can create a docker image of an application that has a Spring Batch job. 1)您可以创建具有Spring Batch作业的应用程序的docker映像。 Let's call it Master application. 我们称之为主应用程序。 The application that will deploy the master application could uses a TaskLauncher or an AppDeployer from spring cloud deployer kubernetes 将部署主应用程序的应用程序可以使用Spring Cloud Deploymenter kubernetes的TaskLauncher或AppDeployer

2) Correct. 2)正确。 In this case you could use remote partitioning. 在这种情况下,您可以使用远程分区。 Each partition would be another docker image with a Job. 每个分区将是另一个带有Job的Docker映像。 This would be your worker. 这是你的工人。 An example of remote partitioning can be found here . 可以在此处找到远程分区的示例。

3) In my case I used spring batch and manage to do everything I needed. 3)就我而言,我使用了春季批处理,并设法完成了我需要的一切。 The only problems I have now is with Upscalling and Downscaling my cluster. 我现在唯一的问题是对群集进行升频和降频。 Since my workers are not stateful I'm experiencing some problems when instances are removed from the cluster. 由于我的工作人员不是有状态的,因此从群集中删除实例时遇到了一些问题。 If you don't need to upscale or downscale your cluster, you are good to go. 如果您不需要升级或缩减集群,那就很好了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM