[英]How to log an ALS model within a mlflow run?
我目前在 databricks 集群上工作,試圖在 mlflow 運行中記錄 ALS model。 嘗試多種不同的方法,我得到一個 TypeError“cannot pickle '_thread.RLock' object”停止我的運行或一個 OSError“沒有這樣的文件或目錄:'/tmp/tmpxiznhskj/sparkml'”沒有停止我的運行但我不能將 model 加載回我的代碼中。
下面是一些准備代碼:
import mlflow
import logging
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from pyspark import SparkContext, SparkConf
data = [{"User": 1, "Item": 1, "Rating": 1},
{"User": 2, "Item": 2, "Rating": 3},
{"User": 3, "Item": 3, "Rating": 1},
{"User": 4, "Item": 2, "Rating": 4},
{"User": 1, "Item": 2, "Rating": 3},
{"User": 2, "Item": 3, "Rating": 2},
{"User": 2, "Item": 4, "Rating": 1},
{"User": 4, "Item": 1, "Rating": 5}
]
conf = SparkConf().setAppName("ALS-mlflow-test")
sc = SparkContext.getOrCreate(conf)
rdd = sc.parallelize(data)
df_rating = rdd.toDF()
(df_train, df_test) = df_rating.randomSplit([0.8, 0.2])
logging.getLogger("mlflow").setLevel(logging.DEBUG)
with mlflow.start_run() as run:
model_als = ALS(maxIter=5, regParam=0.01, userCol="User", itemCol="Item", ratingCol="Rating", implicitPrefs=False,
coldStartStrategy="drop")
model_als.fit(df_train)
mlflow.sklearn.log_model(model_als, artifact_path="test")
這會導致以下錯誤:
_SklearnCustomModelPicklingError:酸洗自定義 sklearn model 保存時 ALS 失敗 model:無法酸洗 '_thread.RLock' object
class MyModel(mlflow.pyfunc.PythonModel):
def __init__(self, model):
self.model = model
def predict(self, context, model_input):
return self.my_custom_function(model_input)
def my_custom_function(self, model_input):
return 0
with mlflow.start_run():
model_als = ALS(maxIter=5, regParam=0.01, userCol="User", itemCol="Item", ratingCol="Rating", implicitPrefs=False,
coldStartStrategy="drop")
my_model = MyModel(model_als)
model_info = mlflow.pyfunc.log_model(artifact_path="model", python_model=my_model)
導致更一般但基本上與步驟 1 中相同的錯誤:
TypeError: 無法 pickle '_thread.RLock' object
from pyspark.ml import Pipeline
with mlflow.start_run() as run:
model_als = ALS(maxIter=5, regParam=0.01, userCol="User", itemCol="Item", ratingCol="Rating", implicitPrefs=False,
coldStartStrategy="drop")
pipeline = Pipeline(stages=[model_als])
pipeline_model = pipeline.fit(df_train)
mlflow.spark.log_model(pipeline_model, artifact_path="test-pipeline")
這次執行了代碼,但查看調試日志也出現了錯誤:
stderr: Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2023/01/05 08:54:22 INFO mlflow.spark: File '/tmp/tmpxiznhskj/sparkml' not found on DFS. Will attempt to upload the file.
Traceback (most recent call last):
File "/databricks/python/lib/python3.9/site-packages/mlflow/utils/_capture_modules.py", line 162, in <module>
main()
File "/databricks/python/lib/python3.9/site-packages/mlflow/utils/_capture_modules.py", line 137, in main
mlflow.pyfunc.load_model(model_path)
File "/databricks/python/lib/python3.9/site-packages/mlflow/pyfunc/__init__.py", line 484, in load_model
model_impl = importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
File "/databricks/python/lib/python3.9/site-packages/mlflow/utils/_capture_modules.py", line 134, in _load_pyfunc_patch
return original(*args, **kwargs)
File "/databricks/python/lib/python3.9/site-packages/mlflow/spark.py", line 832, in _load_pyfunc
return _PyFuncModelWrapper(spark, _load_model(model_uri=path))
File "/databricks/python/lib/python3.9/site-packages/mlflow/spark.py", line 727, in _load_model
model_uri = _HadoopFileSystem.maybe_copy_from_uri(model_uri, dfs_tmpdir)
File "/databricks/python/lib/python3.9/site-packages/mlflow/spark.py", line 404, in maybe_copy_from_uri
return cls.maybe_copy_from_local_file(_download_artifact_from_uri(src_uri), dst_path)
File "/databricks/python/lib/python3.9/site-packages/mlflow/tracking/artifact_utils.py", line 100, in _download_artifact_from_uri
return get_artifact_repository(artifact_uri=root_uri).download_artifacts(
File "/databricks/python/lib/python3.9/site-packages/mlflow/store/artifact/local_artifact_repo.py", line 79, in download_artifacts
raise IOError("No such file or directory: '{}'".format(local_artifact_path))
OSError: No such file or directory: '/tmp/tmpxiznhskj/sparkml'
from pyspark.ml import PipelineModel
logged_model = 'runs:/xyz123/test'
# Load model
loaded_model = PipelineModel.load(logged_model)
導致錯誤:
org.apache.hadoop.fs.UnsupportedFileSystemException:方案“運行”沒有文件系統
import mlflow
logged_model = 'runs:/xyz123/test'
# Load model
loaded_model = mlflow.spark.load_model(logged_model)
# Perform inference via model.transform()
loaded_model.transform(data)
導致以下錯誤:
AttributeError: 'list' object 沒有屬性 '_jdf'
evaluator = RegressionEvaluator(metricName="rmse", labelCol="Rating", predictionCol="prediction")
df_pred = loaded_model.transform(df_test)
rmse = evaluator.evaluate(df_pred)
df_pred.display()
print("Root-mean-square error explicit = " + str(rmse))
user_recs = loaded_model.recommendForAllUsers(2)
user_recs.display()
我的結論是我想要實現的只是在我的 mlflow 運行中記錄提供的 ALS model。 我想不出哪里出了問題,或者我還能嘗試什么
提前致謝!
從 MLFlow 2.1.1 開始, pyspark.ml.recommendation.ALS
不在白名單中。 使用您在 #3 中測試的管道是記錄不受支持的 spark.ml 模型的適當方法。
看起來您在記錄 model 時可能遇到了環境問題,因為我能夠在 Databricks 和本地實現以下實現。
import mlflow
from mlflow.models.signature import infer_signature
from pyspark.ml import Pipeline
from pyspark.ml.recommendation import ALS
# get a bigger test split from your data
(df_train, df_test) = df_rating.randomSplit([0.6, 0.4])
with mlflow.start_run() as run:
# initialize als model
als = ALS(
maxIter=5,
regParam=0.01,
userCol="User",
itemCol="Item",
ratingCol="Rating",
implicitPrefs=False,
coldStartStrategy="drop",
)
# build and fit pipeline
pipeline = Pipeline(stages=[als])
pipeline_model = pipeline.fit(df_train)
# test predict and infer signature
predictions = pipeline_model.transform(df_test)
signature = infer_signature(df_train, predictions)
# log model
mlflow.spark.log_model(
pipeline_model, artifact_path="spark-model", signature=signature
)
mlflow.end_run()
重新加載以進行推理
在 Databricks 上,您需要先將 model 從實驗中移動到 model 注冊表中。 這可以通過 UI 或使用以下命令完成。
# construct the model_uri from generated run_id and the set artifact_path
run_id = "2f9a5424b1f44435a9413a3e2762b8a9"
artifact_path = "spark-model"
model_uri = f"runs:/{run_id}/{artifact_path}"
# move the model into the registry
model_details = mlflow.register_model(model_uri=model_uri, name=model_name)
# load model
version = 1
model_uri = f"models:/{model_name}/{version}"
loaded_model = mlflow.spark.load_model(model_uri)
# test predict
loaded_model.transform(df_test).show()
+----+------+----+----------+
|Item|Rating|User|prediction|
+----+------+----+----------+
| 2| 3| 2| 1.0075595|
+----+------+----+----------+
或者,您也可以直接登錄 model 注冊表。
# log model
model_name="als-model"
mlflow.spark.log_model(
pipeline_model,
artifact_path="spark-model",
registered_model_name=model_name
)
# log model
version = 1
model_uri = f"models:/{model_name}/{version}"
loaded_model = mlflow.spark.load_model(model_uri)
# test predict
loaded_model.transform(df_test).show()
我還嘗試將 ALS model 和管道包裝在自定義 pyfunc 中,在這兩種情況下我都收到了完全相同的錯誤。 我相信 ALS model 有一些不可序列化的東西可以防止這個......
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.