简体   繁体   中英

Error logging Spark model with mlflow to databricks registry, via databricks-connect

I'm trying to log a trained spark model on mlflow using databricks-connect. I want this model to be logged in the Databricks registry. For now, my code looks like this:

mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Users/xxxxx/experiment_name")

with mlflow.start_run(run_name="my_run") as _:
    mlflow.spark.log_model(my_spark_model, "my_model")

When it runs the log_model line, execution breaks with the following stack trace:

22/07/21 11:05:03 WARN ProtoSerializer: Failed to deserialize remote exception java.io.InvalidClassException: failed to read class descriptor at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source) at java.io.ObjectInputStream.readClassDesc(Unknown Source) at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at org.apache.spark.sql.util.ProtoSerializer.$anonfun$deserializeObject$1(ProtoSerializer.scala:6618) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at org.apache.spark.sql.util.ProtoSerializer.deserializeObject(ProtoSerializer.scala:6603) at org.apache.spark.sql.util.ProtoSerializer.deserializeException(ProtoSerializer.scala:6634) at com.databricks.service.SparkServiceRemoteFuncRunner.executeRPC(SparkServiceRemoteFuncRunner.scala:188) at com.databricks.service.SparkServiceRemoteF uncRunner.$anonfun$execute0$1(SparkServiceRemoteFuncRunner.scala:121) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at com.databricks.service.SparkServiceRemoteFuncRunner.withRetry(SparkServiceRemoteFuncRunner.scala:135) at com.databricks.service.SparkServiceRemoteFuncRunner.execute0(SparkServiceRemoteFuncRunner.scala:113) at com.databricks.service.SparkServiceRemoteFuncRunner.$anonfun$execute$1(SparkServiceRemoteFuncRunner.scala:86) at com.databricks.spark.util.Log4jUsageLogger.recordOperation(UsageLogger.scala:247) at com.databricks.spark.util.UsageLogging.recordOperation(UsageLogger.scala:429) at com.databricks.spark.util.UsageLogging.recordOperation$(UsageLogger.scala:408) at com.databricks.service.SparkServiceRPCClientStub.recordOperation(SparkServiceRPCClientStub.scala:58) at com.databricks.service.SparkServiceRemoteFuncRunner.execute(SparkServiceRemoteFuncRunner.scala:78) at com.databricks.service.SparkServiceRemoteFuncRunner.execute$(SparkServiceRemoteFuncRu nner.scala:67) at com.databricks.service.SparkServiceRPCClientStub.execute(SparkServiceRPCClientStub.scala:58) at com.databricks.service.SparkServiceRPCClientStub.fileSystemOperation(SparkServiceRPCClientStub.scala:297) at com.databricks.service.FSClient.send(FSClient.scala:51) at com.databricks.service.FSClient.getFileStatus(FSClient.scala:181) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426) at org.apache.spark.ml.util.FileSystemOverwrite.handleOverwrite(ReadWrite.scala:675) at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:167) at org.apache.spark.ml.PipelineModel$PipelineModelWriter.super$save(Pipeline.scala:344) at org.apache.spark.ml.PipelineModel$PipelineModelWriter.$anonfun$save$4(Pipeline.scala:344) at org.apache.spark.ml.MLEvents.withSaveInstanceEvent(events.scala:175) at org.apache.spark.ml.MLEvents.withSaveInstanceEvent$(events.scala:170) at org.apache.spark.ml.util.Instrumentation.withSaveInstanceEvent(Instrumentation.scala:43) at org.apache.spark.ml .PipelineModel$PipelineModelWriter.$anonfun$save$3(Pipeline.scala:344) at org.apache.spark.ml.PipelineModel$PipelineModelWriter.$anonfun$save$3$adapted(Pipeline.scala:344) at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:284) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:284) at org.apache.spark.ml.PipelineModel$PipelineModelWriter.save(Pipeline.scala:344) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execu te(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.ClassNotFoundException: com.databricks.backend.daemon.data.common.InvalidMountException at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Unknown Source) at org.apache.spark.util.Utils$.classForName(Utils.scala:242) at org.apache.spark.sql.util.SparkServiceObjectInputStream.readResolveClassDescriptor(SparkServiceObjectInputStream.scala:60) at org.apache.spark.sql.util.SparkServiceObjectInputStream.readClassDescriptor(SparkServiceObjectInputStream.scala:55) ... 51 more **22/07/21 11:05:03 ERROR Instrumentation: com.databricks.service.SparkServiceRemoteException: com.databricks.backend.daemon.data.common.InvalidMountException: Error while using path /databricks/mlflow-trackin g/000000000000000/0a0a0a0a0a0a0a0a0a0a/artifacts\experiment_name/sparkml for resolving path '/000000000000000/0a0a0a0a0a0a0a0a0a0a/artifacts\experiment_name/sparkml' within mount at '/databricks/mlflow-tracking'.> **

<...>>

Caused by: java.io.IOException: No FileSystem for scheme: unsupported-access-mechanism-for-path--use-mlflow-client at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2Factory.createFileSystem(DatabricksFileSystemV2Factory.scala:124) at com.databricks.backend.daemon.data.filesystem.MountEntryResolver.$anonfun$resolve$1(MountEntryResolver.scala:67) at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:395) at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:484) at com.databricks.logging.Usag eLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:504) at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:266) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:261) at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:258) at com.databricks.common.util.locks.LoggedLock$.withAttributionContext(LoggedLock.scala:73) at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:305) at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:297) at com.databricks.common.util.locks.LoggedLock$.withAttributionTags(LoggedLock.scala:73) at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:479) at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:404) at com.databricks.common.util.locks.LoggedLock$.recordO perationWithResultTags(LoggedLock.scala:73) at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:395) at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:367) at com.databricks.common.util.locks.LoggedLock$.recordOperation(LoggedLock.scala:73) at com.databricks.common.util.locks.LoggedLock$.withLock(LoggedLock.scala:120) at com.databricks.common.util.locks.PerKeyLock.withLock(PerKeyLock.scala:36) at com.databricks.backend.daemon.data.filesystem.MountEntryResolver.resolve(MountEntryResolver.scala:64)>

<...> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _>

answer = 'xro1535' gateway_client = <py4j.java_gateway.GatewayClient object at 0x0000023716B5BE20> target_id = 'o1532', name = 'copyToLocalFile'>

 def get_return_value(answer, gateway_client, target_id=None, name=None): """Converts an answer received from the Java gateway into a Python object. For example, string representation of integers are converted to Python integer, string representation of objects are converted to JavaObject instances, etc. :param answer: the string returned by the Java gateway :param gateway_client: the gateway client used to communicate with the Java Gateway. Only necessary if the answer is a reference (eg, object, list, map) :param target_id: the name of the object from which the answer comes from (eg, *object1* in `object1.hello()`). Optional. :param name: the name of the member from which the answer comes from (eg, *hello* in `object1.hello()`). Optional. """ if is_error(answer)[0]: if len(answer) > 1: type = answer[1] value = OUTPUT_CONVERTER[type](answer[2:], gateway_client) if answer[1] == REFERENCE_TYPE: raise Py4JJavaError( "An error occurred while calling {0}{1}{2}.\n". format(target_id, ".", name), value)

E py4j.protocol.Py4JJavaError: An error occurred while calling o1532.copyToLocalFile. E : java.io.IOException: (null) entry in command string: null chmod 0644 C:\Users\itscarlayall\AppData\Local\Temp\tmpalmxdo16\model\sparkml\metadata_SUCCESS E at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) E at org.apache.hadoop.util.Shell.execCommand(Shell.java:869) E at org.apache.hadoop.util.Shell.execCommand(Shell.java:852) E at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:733) E at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:225) E at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:209) E at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:307) E at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:296) E at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)

Solved it! I needed to install winutils. Apparently, even though databricks-connect sends the execution to the remote databricks, there are still some local operations mlflow needs to save a spark model 🤷‍♀️

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM