MLflow Kubernetes Pod Deployment

Question

I'm attempting to create a kubernetes pod that will run MLflow tracker to store the mlflow artifacts in a designated s3 location. Below is what I'm attempting to deploy with

Dockerfile:

FROM python:3.7.0

RUN pip install mlflow==1.0.0
RUN pip install boto3
RUN pip install awscli --upgrade --user

ENV AWS_MLFLOW_BUCKET aws_mlflow_bucket
ENV AWS_ACCESS_KEY_ID aws_access_key_id
ENV AWS_SECRET_ACCESS_KEY aws_secret_access_key

COPY run.sh /

ENTRYPOINT ["/run.sh"]

# docker build -t seedjeffwan/mlflow-tracking-server:1.0.0 .
# 1.0.0 is current mlflow version

run.sh:

#!/bin/sh

set -e

if [ -z $FILE_DIR ]; then
  echo >&2 "FILE_DIR must be set"
  exit 1
fi

if [ -z $AWS_MLFLOW_BUCKET ]; then
  echo >&2 "AWS_MLFLOW_BUCKET must be set"
  exit 1
fi

if [ -z $AWS_ACCESS_KEY_ID ]; then
  echo >&2 "AWS_ACCESS_KEY_ID must be set"
  exit 1
fi

if [ -z $AWS_SECRET_ACCESS_KEY ]; then
  echo >&2 "AWS_SECRET_ACCESS_KEY must be set"
  exit 1
fi

mkdir -p $FILE_DIR && mlflow server \
    --backend-store-uri $FILE_DIR \
    --default-artifact-root s3://${AWS_MLFLOW_BUCKET} \
    --host 0.0.0.0 \
    --port 5000

mlflow.yaml:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mlflow-tracking-server
  namespace: default
spec:
  selector:
    matchLabels:
      app: mlflow-tracking-server
  replicas: 1
  template:
    metadata:
      labels:
        app: mlflow-tracking-server
    spec:
      containers:
      - name: mlflow-tracking-server
        image: seedim/mlflow-tracker-service:v1
        ports:
        - containerPort: 5000
        env:
        # FILE_DIR can not be mount dir, MLFLOW need a empty dir but mount dir has lost+found
        - name: FILE_DIR
          value: /mnt/mlflow/manifest
        - name: AWS_MLFLOW_BUCKET
          value: <aws_s3_bucket>
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: aws-secret
              key: AWS_ACCESS_KEY_ID
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: aws-secret
              key: AWS_SECRET_ACCESS_KEY
        volumeMounts:
        - mountPath: /mnt/mlflow
          name: mlflow-manifest-storage
      volumes:
        - name: mlflow-manifest-storage
          persistentVolumeClaim:
            claimName: mlflow-manifest-pvc

---
apiVersion: v1
kind: Service
metadata:
  name: mlflow-tracking-server
  namespace: default
  labels:
    app: mlflow-tracking-server
spec:
  ports:
  - port: 5000
    protocol: TCP
  selector:
    app: mlflow-tracking-server

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: mlflow-manifest-pvc
  namespace: default
spec:
  storageClassName: gp2
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi

I am then building the docker image, saving it to the minikube environment, and then attempting to run the docker image on a kubernetes pod.

When I try this, I get CrashLoopBackOff error for the image pod and 'pod has unbound immediate PersistentVolumeClaims' for the pod created with the yaml.

I'm attempting to follow the information here ( https://github.com/aws-samples/eks-kubeflow-workshop/blob/master/notebooks/07_Experiment_Tracking/07_02_MLFlow.ipynb ).

Is there anything noticeable that I'm doing wrong in this situation?

Thank you

Answer 1

The issue here is related to Persistent Volume Claim that is not provisioned by Your minikube cluster.

You will need to make a decision to switch to platform managed kubernetes service or to stick with minikube and manually satisfy the Persistent Volume Claim or with alternative solutions.

The simplest option would be to use helm charts for mflow installation like this or this .

The first helm chart has listed requirements:

Prerequisites

Kubernetes cluster 1.10+

Helm 2.8.0+

PV provisioner support in the underlying infrastructure.

Just like in the guide You followed this one requires PV provisioner support.

So by switching to EKS You most likely will have easier time deploying mflow with artifact storing with s3.

If You wish to stay on minikube, You will need to modify the helm chart values or the yaml files from the guide You linked to be compatible with You manual configuration of PV. It might also need permissions configuration for s3.

The second helm chart has the following limitation/feature:

Known limitations of this Chart

I've created this Chart to use it in a production-ready environment in my company. We are using MLFlow with a Postgres backend store.

Therefore, the following capabilities have been left out of the Chart:

Using persistent volumes as a backend store.

Using other database engines like MySQL or SQLServer.

You can try to install it on minikube. This setup would result in artifacts being stored on remote a database. It would still need tweaking in order to connect to s3.

Anyway minikube still is a lightweight distribution of kubernetes targeted mainly for learning, so You will eventually reach another limitation if You stick to it for too long.

Hope it helps.

MLflow Kubernetes Pod Deployment

Question

1 answers

solution1
2 ACCPTED 2020-04-09 12:06:32

Prerequisites

Known limitations of this Chart

MLflow Kubernetes Pod Deployment

Question

1 answers

solution1 2 ACCPTED 2020-04-09 12:06:32

Prerequisites

Known limitations of this Chart

solution1
2 ACCPTED 2020-04-09 12:06:32