简体繁体 English

Kubernetes - 为数据库部署初始化持久卷的最佳方式

[英]Kubernetes - Best way to initialize persistent volume for a database deployment

原文 2019-07-11 19:03:13 2 2 kubernetes

I have a Neo4j service, but before the deployment starts up, I need to pre-fill it with data (about 2GB of data).我有一个 Neo4j 服务，但是在部署开始之前，我需要用数据（大约 2GB 的数据）预先填充它。 Currently, I wrote a Kubernetes Job to transform the data from a CSV and format it for the database using the neo4j-admin tool.目前，我编写了一个 Kubernetes Job来转换 CSV 中的数据，并使用neo4j-admin工具将其格式化为数据库。 It saves the formatted data to a persistent volume .它将格式化数据保存到持久卷。 After waiting for the job to complete, I mount the volume in the Neo4j container and the container is effectively read-only on this data for the rest of its life.等待作业完成后，我将卷安装在 Neo4j 容器中，并且该容器在其余生中对该数据有效地只读。

Is there a better way to do this more automatically?有没有更好的方法来更自动地做到这一点？

I don't want to have to wait for the job to complete to run another command to create the Neo4j deployment.我不想等待作业完成来运行另一个命令来创建 Neo4j 部署。 I looked into initContainers, but that isn't suitable because I don't want to redo the data filling when a pod is re-created.我查看了 initContainers，但这不合适，因为我不想在重新创建 pod 时重做数据填充。 I just want subsequent pods to read from the same persistent volume.我只希望后续 pod 从同一个持久卷中读取。 Is there a way to wait for the job to complete first?有没有办法先等待工作完成？

2 个解决方案

As Jobs can't natively spawn new objects once finished (and if exited gracefully, using PreStop to invoke further actions won't work), you might want to monitor the API objects instead.由于作业一旦完成就不能在本机生成新对象（如果正常退出，使用PreStop调用进一步的操作将不起作用），您可能希望改为监视 API 对象。

Programatically accessing the API to determine when the Job is finished and then, create your Deployment object might be a feasible, automated way to do it. 以编程方式访问 API以确定作业何时完成，然后创建部署对象可能是一种可行的自动化方法。

Doing it this way, you don't have to worry for redoing the data processing with initContainers as you can essentially call the deployment and remount the already existing volume.这样做，这样，你就不必担心与initContainers，你基本上可以调用的部署和重新安装现有量重做数据处理。

Also, using the official Go library allows you to either run within the cluster, in a pod or externally .此外，使用官方 Go 库允许您在集群内、pod 中或外部运行。

I assume that your neo4j application data won't be updated from your neo4j deployment based on you said that the deployment loads the volume as read-only.我假设您的 neo4j 应用程序数据不会从您的 Neo4j 部署中更新，因为您说部署将卷加载为只读。

If that is the case why do you want kubernetes to do the data loading?如果是这种情况，为什么要让 kubernetes 进行数据加载？ Use object storage like s3 or azure data lake and ensure that there is some data workflow pipeline that will update the object storage.使用 s3 或 azure 数据湖等对象存储，并确保有一些数据工作流管道将更新对象存储。 There are many tools that provides data pipeline features such as oozie, airflow.有许多工具可以提供数据管道功能，例如 oozie、airflow。

In your deployment, then you can refer to the object storage via Persistent Volume Claim.在您的部署中，您可以通过 Persistent Volume Claim 来引用对象存储。