mlflow 运行命令中单个参数的多个值

Question

我刚开始学习 mlflow，想知道如何将多个值传递给 mlflow 运行命令中的每个参数。

目标是将字典作为 param_grid 传递给 GridSearchCV 以执行交叉验证。

在我的主代码中，我使用 argparse 检索命令行参数。 通过在 add_argument() 中添加 nargs='+'，我可以为每个超参数写入间隔值，然后应用 vars() 来创建字典。 请参见下面的代码：

import argparse

# Build the parameters for the command-line
param_names = list(RandomForestClassifier().get_params().keys())

# Param types in the same order they appear in param_names by using get_params()
param_types = [bool, float, dict, str, int, float, int, float, float, float,
               float, float, float, int, int, bool, int, int, bool]

# Allow for only optional command-line arguments
parser = argparse.ArgumentParser()
grid_group = parser.add_argument_group('param_grid_group')
for i, p in enumerate(param_names):
    grid_group.add_argument(f'--{p}', type=param_types[i], nargs='+')
#Create a param_grid to be passed to GridSearchCV
param_grid_unprocessed = vars(parser.parse_args())

这适用于经典的 python 命令：

python my_code.py --max_depth 2 3 4 --n_estimators 400 600 1000

正如我所说，在这里我可以为每个超参数编写间隔值，上面的代码通过将值分组到列表中并返回下面的字典来实现魔法，然后我可以将其传递给 GridSearchCV：

{'max_depth':[2, 3, 4], 'n_estimators':[400, 600, 1000]}

但是，使用 mlflow run 命令，我无法正确使用它，因为它只接受每个参数的一个值。 这是我的 MLproject 文件：

name: mlflow_project

conda_env: conda.yml

entry_points:

  main:
    parameters:
      max_depth: int
      n_estimators: int
    command: "python my_code.py --max_depth {max_depth} --n_estimators {n_estimators}"

所以这有效：

mlflow run . -P max_depth=2 -P n_estimators=400

但不是这个：

 mlflow run . -P max_depth=[2, 3, 4] -P n_estimators=[400, 600, 1000]

在文档中，似乎不可能做到这一点。 那么，有没有什么技巧可以解决这个问题呢？

先感谢您！

Answer 1

我一直在通过将文件名作为参数传递并从脚本中的文件加载信息来解决此问题。 不理想，但它有效。 我很想看看其他人已经尝试过什么。

mlflow 运行命令中单个参数的多个值

问题描述

1 个解决方案

解决方案1
0 2022-08-10 20:32:23

mlflow 运行命令中单个参数的多个值

问题描述

1 个解决方案

解决方案1 0 2022-08-10 20:32:23

解决方案1
0 2022-08-10 20:32:23