[英]How to serialize a pyspark Pipeline object?
I'm trying to serialize a PySpark Pipeline
object so that it can be saved and retrieved later. 我正在尝试序列化PySpark
Pipeline
对象,以便以后可以保存和检索它。 Tried using the Python pickle library as well as the PySpark's PickleSerializer
, the dumps()
call itself is failing. 尝试使用Python pickle库以及PySpark的
PickleSerializer
, PickleSerializer
dumps()
调用本身就失败了。
Providing the code snippet while using native pickle
library. 使用本机
pickle
库时提供代码片段。
pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])
with open ('myfile', 'wb') as f:
pickle.dump(pipeline,f,2)
with open ('myfile', 'rb') as f:
pipeline1 = pickle.load(f)
Getting the below error while running: 运行时出现以下错误:
py4j.protocol.Py4JError: An error occurred while calling o32.__getnewargs__. Trace:
py4j.Py4JException: Method __getnewargs__([]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:335)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:344)
at py4j.Gateway.invoke(Gateway.java:252)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:785)
Is it possible to serialize PySpark Pipeline
objects ? 是否可以序列化PySpark
Pipeline
对象?
Technically speaking you can easily pickle Pipeline
object: 从技术上讲,您可以轻松挑选
Pipeline
对象:
from pyspark.ml.pipeline import Pipeline
import pickle
pickle.dumps(Pipeline(stages=[]))
## b'\x80\x03cpyspark.ml.pipeline\nPipeline\nq ...
What you cannot pickle is Spark Transformers
and Estimators
which are only thin wrappers around JVM objects. 你不能腌制的是Spark
Transformers
和Estimators
,它们只是JVM对象的瘦包装器。 If you really need this you can wrap this in a function for example: 如果你真的需要这个,你可以将它包装在一个函数中,例如:
def make_pipeline():
return Pipeline(stages=[Tokenizer(inputCol="text", outputCol="words")])
pickle.dumps(make_pipeline)
## b'\x80\x03c__ ...
but since it is just a piece of code and doesn't store any persistent data it doesn't look particularly useful. 但由于它只是一段代码并且不存储任何持久性数据,因此它看起来并不特别有用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.