[英]How Do You "Permanently" Delete An Experiment In Mlflow?
任何地方都沒有記錄永久刪除實驗。 我正在使用帶有后端 postgres 數據庫的 Mlflow
這是我運行的:
client = MlflowClient(tracking_uri=server)
client.delete_experiment(1)
這將刪除實驗,但是當我運行一個與我剛剛刪除的實驗同名的新實驗時,它將返回此錯誤:
mlflow.exceptions.MlflowException: Cannot set a deleted experiment 'cross-sell' as the active experiment. You can restore the experiment, or permanently delete the experiment to create a new one.
我在顯示如何永久刪除所有內容的文檔中找不到任何地方。
不幸的是,目前似乎無法通過 UI 或 CLI 執行此操作:-/
執行此操作的方法取決於您使用的后端文件存儲的類型。
文件存儲:
如果您使用文件系統作為存儲機制(默認),那么這很容易。 “已刪除”的實驗將移至.trash
文件夾。 你只需要清除它:
rm -rf mlruns/.trash/*
截至當前版本的文檔(1.7.2),他們評論:
建議使用 cron 作業或替代工作流機制來清除
.trash
文件夾。
SQL 數據庫:
這更棘手,因為需要刪除依賴項。 我正在使用 MySQL,這些命令對我有用:
USE mlflow_db; # the name of your database
DELETE FROM experiment_tags WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
)
);
DELETE FROM metrics WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
)
);
DELETE FROM tags WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
)
);
DELETE FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM experiments where lifecycle_stage="deleted";
從 mlflow 1.11.0 開始,在實驗中永久刪除運行的推薦方法是: mlflow gc [OPTIONS]
。
從文檔中, mlflow gc
將
從指定的后端存儲永久刪除已刪除生命周期階段中的運行。 此命令刪除與指定運行關聯的所有工件和元數據。
如果您想永久刪除 MLFlow 的垃圾箱(如果您使用 PostgreSQL 作為后端存儲),我將添加 SQL 命令。
更改到您的 MLFlow 數據庫,例如使用: \\c mlflow
然后:
DELETE FROM experiment_tags WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM metrics WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM tags WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM params WHERE run_uuid=ANY(
SELECT run_uuid FROM runs where experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
));
DELETE FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM experiments where lifecycle_stage='deleted';
不同之處在於,我在那里添加了“params”表 SQL 刪除命令。
擴展@Lee Netherton的回答,您可以使用PyMySQL執行這些查詢,並在從 MLFlow 跟蹤客戶端刪除實驗后從 MLFlow 跟蹤服務器中刪除所有元數據。
import pymysql
def perm_delete_exp():
connection = pymysql.connect(
host='localhost',
user='user',
password='password',
db='mlflow',
cursorclass=pymysql.cursors.DictCursor)
with connection.cursor() as cursor:
queries = """
USE mlflow;
DELETE FROM experiment_tags WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted");
DELETE FROM latest_metrics WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
DELETE FROM metrics WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
DELETE FROM tags WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
DELETE FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted");
DELETE FROM experiments where lifecycle_stage="deleted";
"""
for query in queries.splitlines()[1:-1]:
cursor.execute(query.strip())
connection.commit()
connection.close()
您可以(也許應該)一次執行整個查詢,但我發現通過這種方式調試更容易。
不幸的是,在我的例子中,上面的 SQL 命令不適用於 SQLITE。 這是在數據庫 IDE 中使用 sqlite 的 SQL 版本,將“any”命令替換為“in”:
DELETE FROM experiment_tags WHERE experiment_id in (
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM latest_metrics WHERE run_uuid in (
SELECT run_uuid FROM runs WHERE experiment_id in (
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM metrics WHERE run_uuid in (
SELECT run_uuid FROM runs WHERE experiment_id in (
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM tags WHERE run_uuid in (
SELECT run_uuid FROM runs WHERE experiment_id in (
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM params WHERE run_uuid in (
SELECT run_uuid FROM runs where experiment_id in (
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
));
DELETE FROM runs WHERE experiment_id in (
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM experiments where lifecycle_stage='deleted';
如果您使用 S3 作為工件的后端存儲並且有一個用於跟蹤的 EC2 服務器,這是我刪除完整實驗“文件夾”的解決方法。
通過實驗 ID 列表刪除 S3 上的完整實驗:
def permanently_delete_mlflow_experiments(list_of_experiment_ids: list):
# loop over the experiment ids you want to delete
for experiment_id in list_of_experiment_ids:
print(f'deleting experiment {experiment_id}')
# run shell command for S3 deletion via aws s3 rm
os.system(f"aws s3 rm YOUR_BUCKET_URI --recursive --exclude '*' --include '{experiment_id}/*'")
通過運行 ID 列表刪除特定運行:
def permanently_delete_runs_on_mlflow(list_of_runs_id: list):
mlflow_client = MlflowClient(tracking_uri=YOUR_MLFLOW_TRACKING_URI)
for run_id in list_of_runs_id:
# retrieve experiment id corresponding to the run id
experiment_id = mlflow_client.get_run(run_id).info.experiment_id
print(f'deleting run {run_id} from experiment {experiment_id}')
os.system(f"aws s3 rm YOUR_BUCKET_URI --recursive --exclude '*' --include '{experiment_id}/{run_id}/*'")
請注意,要使其正常工作,您需要安裝 AWS CLI 。
它基本上從 Python 運行一個 shell 命令來實現這個目的。 作為旁注,使用 EC2 的 mlflow 跟蹤在 S3 上創建了根據實驗 ID 命名的“文件夾”,其中包含與該實驗對應的每個運行 ID 的“子文件夾”。 上面的代碼依賴於這個結構。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.