简体   繁体   English

在kubernetes中为每个作业安装不同的卷

[英]mount a different volume for each job in kubernetes

We have a process which takes one input - directory location (NFS share path). 我们有一个过程需要一个输入-目录位置(NFS共享路径)。 Process reads, does some processing and writes to it. 进程读取,执行一些处理并将其写入。

Due to nature of the contents, directory and process permissions are set such a way that process can only access that directory and nothing else. 由于内容的性质,设置目录和进程权限的方式是使进程只能访问该目录,而不能访问其他目录。 This process is short lived (~1 minute) and there may be hundreds of thousands of invocations each day - each time on a different directory. 这个过程是短暂的(〜1分钟),每天可能有数十万次调用-每次都在不同的目录上。

Trying to move this workload to docker/kubernetes environment. 尝试将此工作负载移至docker / kubernetes环境。 One way I think of is 我想到的一种方法是

  1. Create a PersistentVolume for the directory 为目录创建一个PersistentVolume
  2. Create a PersistentVolumeClaim and bind it 创建一个PersistentVolumeClaim并将其绑定
  3. Mount the above PVC to the pod specification for the job 将以上PVC安装到吊舱规格中
  4. Once the job is complete, delete the PV, PVC and job 作业完成后,删除PV,PVC和作业

Just looking at the steps, I think it might be overkill or lot of overhead (lots of objects to be created in k8s, underlying volume mounted/unmounted on the host for each job). 仅查看步骤,我认为这可能是多余的或过多的开销(要在k8s中创建很多对象,每个作业在主机上已安装/未安装的基础卷)。

Any other ideas? 还有其他想法吗?

Any other ideas? 还有其他想法吗?

If I get your setup right there are several approaches each with it's own pro and cons. 如果我的设置正确,则有几种方法各有优缺点。 I'll try to list some of the ideas: 我将尝试列出一些想法:

  • As you noted, you can create each time all of the resources (PV,PVC...). 如您所述,您可以每次创建所有资源(PV,PVC ...)。 Now, I wouldn't be worried about 'lot of objects' in k8s but such approach can introduce significant overhead and execution time penalty. 现在,我不必担心k8s中的“大量对象”,但是这种方法会引入大量开销和执行时间。 If your process is indeed 1-2 sec each then bounding, starting and tear down can introduce said overhead. 如果您的过程确实确实是每个1-2秒,那么边界,开始和拆除可能会带来上述开销。 Pro is better isolation and concurrency and con is introduced overhead. Pro是更好的隔离和并发性,并且con引入了开销。

  • Another approach might be to make directory structure like so: 另一种方法可能是使目录结构如下:

     /root_of_raw_data | +-- /process_folder_1 | +-- /process_folder_2 | ... | +-- /process_folder_n 

    and then make PV and PVC that point only to /root_of_raw_data , supposing that your provisioner allows for ReadWriteMany (and NFS provisioner should allow that). 然后使PV和PVC仅指向/root_of_raw_data ,假设您的供应商允许ReadWriteMany(而NFS供应商应该允许)。 Then you wouldn't need time to setup/teardown PV/PVC (they would be constantly bound) and on each pod start you would mount it using subPath to /process_folder_x (where x is corespondent to that very process) to say /my_process_work_folder inside that pod and then start process with /my_process_work_folder . 然后,您将无需花费时间来设置/拆卸PV / PVC(它们将被不断绑定),并且在每个Pod开始时,您都可以使用子路径将其安装到/process_folder_x (其中x是该进程的核心),在其中说/my_process_work_folder该吊舱,然后使用/my_process_work_folder开始进程。 Pro is that you don't have to introduce overhead for PV/PVC bounding and con is that you still have overhead of pod starting/teardown. 优点是您不必为PV / PVC边界引入开销,而缺点是您仍然需要容器启动/拆卸的开销。

  • Yet another approach could be to have same directory structure as above, but instead of using subPath to mount process folders to pods individually, you actually mount /root_of_raw_data folder to, say, /my_root_work_folder inside a pod. 还有另一种方法是具有与上述相同的目录结构,但不是使用subPath将进程文件夹分别安装到Pod,而是实际上将/root_of_raw_data文件夹安装到Pod中的/my_root_work_folder Then you would start process with /my_root_work_folder/process_folder_x (again x being tied to process in question). 然后,您将使用/my_root_work_folder/process_folder_x (再次将x绑定到相关进程)来启动进程。 This way you could leave pod running all the time (or multiple pods if needed, again providing ReadWriteMany can be used for PV) and instead of starting/teardown pods simply calling kubectl -n my-process-namespace exec -it my-process-pod-name my_process_start_command /my_root_work_folder/process_folder_y . 这样,您可以一直保持Pod运行(或如果需要,可以运行多个Pod,再次提供ReadWriteMany可用于PV),而不是仅通过调用kubectl -n my-process-namespace exec -it my-process-pod-name my_process_start_command /my_root_work_folder/process_folder_y来启动/拆卸Pod。 kubectl -n my-process-namespace exec -it my-process-pod-name my_process_start_command /my_root_work_folder/process_folder_y Pro is that you don't have any overhead for start/stop at all and con is that you have pod(s) constantly running and they all share same process root folder. 优点是您根本没有启动/停止的任何开销,并且缺点是您的pod一直在运行,并且它们共享相同的进程根文件夹。

You can also make variations on mentioned approaches using jobs if you need pods logs, or, alternatively, you can make schedulers around pod usage and such... This answer was aimed mainly at giving you some other angles about eliminating potential overhead of setup/teardown and is by no means exhaustive list of approaches. 如果您需要Pod日志,也可以使用作业对上述方法进行更改,或者,也可以根据Pod的使用情况制定调度程序,例如...该答案的主要目的是为您提供一些其他角度,以消除设置/设置的潜在开销。拆卸,绝不是详尽的方法清单。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM