[英]pickle.loads gives 'module' object has no attribute '<ClassName>' inside a Pyspark Pandas Udf
I am attempting to pickle and unpickle a class instance in a PySpark Pandas udf.我正在尝试在 PySpark Pandas udf 中腌制和解封 class 实例。 Pickling works outside of the udf just fine:
酸洗在 udf 之外工作得很好:
class ExampleModel:
pass
clf = ExampleModel(args)
pickled_val = base64.b64encode(pickle.dumps(clf))
clf2 = pickle.loads(base64.b64decode(pickled_val))
print(clf2)
# <__main__.ExampleModel instance at 0x7f04d7444780>
However, inside of a pandas udf, I am able to access the ExampleModel class but cannot unpickle the string column.但是,在 pandas udf 内部,我可以访问 ExampleModel class 但无法解开字符串列。
df = spark_session.createDataFrame(
[
(1, pickled_val, '')
],
['id', 'txt', 'error']
)
@pandas_udf(df.schema, PandasUDFType.GROUPED_MAP)
def example_unpickle(pdf):
try:
clf_obj = ExampleModel()
except Exception as e:
pdf.loc[:,'error'] = "1:" + str(e)
return pdf
try:
clf3 = pickle.loads(base64.b64decode(pdf.iloc[0,1]))
except Exception as e:
pdf.loc[:,'error'] = "2: " + str(e)
return pdf
df_clf = df\
.groupby('id')\
.apply(example_unpickle)
df_clf.show(truncate = False)
Gives the Error:给出错误:
AttributeError: 'module' object has no attribute 'ExampleModel'
+---+------------------------------------------------+--------------------------------------------------+
|id |txt |error |
+---+------------------------------------------------+--------------------------------------------------+
|1 |KGlfX21haW5fXwpFeGFtcGxlTW9kZWwKcDAKKGRwMQpiLg==|2: 'module' object has no attribute 'ExampleModel'|
+---+------------------------------------------------+--------------------------------------------------+
The solution was to make the class a separate file and create an __init__.py
in the same directory.解决方案是使 class 成为一个单独的文件,并在同一目录中创建一个
__init__.py
。
Then import the class as:然后将 class 导入为:
from ExampleFileName import ExampleModel
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.