简体   繁体   English

pickle.loads 给出'模块' object 没有属性'<classname> ' 在 Pyspark Pandas Udf 内</classname>

[英]pickle.loads gives 'module' object has no attribute '<ClassName>' inside a Pyspark Pandas Udf

I am attempting to pickle and unpickle a class instance in a PySpark Pandas udf.我正在尝试在 PySpark Pandas udf 中腌制和解封 class 实例。 Pickling works outside of the udf just fine:酸洗在 udf 之外工作得很好:

class ExampleModel:
    pass

clf = ExampleModel(args)
pickled_val = base64.b64encode(pickle.dumps(clf))
clf2 = pickle.loads(base64.b64decode(pickled_val))
print(clf2)
# <__main__.ExampleModel instance at 0x7f04d7444780>

However, inside of a pandas udf, I am able to access the ExampleModel class but cannot unpickle the string column.但是,在 pandas udf 内部,我可以访问 ExampleModel class 但无法解开字符串列。

df = spark_session.createDataFrame(
    [
        (1, pickled_val, '') 
    ],
    ['id', 'txt', 'error'] 
)

@pandas_udf(df.schema, PandasUDFType.GROUPED_MAP)
def example_unpickle(pdf):
    try:
        clf_obj = ExampleModel()
    except Exception as e:
        pdf.loc[:,'error'] = "1:" + str(e)
        return pdf

    try:
        clf3 = pickle.loads(base64.b64decode(pdf.iloc[0,1]))
    except Exception as e:
        pdf.loc[:,'error'] = "2: " + str(e)
        return pdf


df_clf = df\
            .groupby('id')\
            .apply(example_unpickle)

df_clf.show(truncate = False)

Gives the Error:给出错误:

AttributeError: 'module' object has no attribute 'ExampleModel'

+---+------------------------------------------------+--------------------------------------------------+
|id |txt                                             |error                                             |
+---+------------------------------------------------+--------------------------------------------------+
|1  |KGlfX21haW5fXwpFeGFtcGxlTW9kZWwKcDAKKGRwMQpiLg==|2: 'module' object has no attribute 'ExampleModel'|
+---+------------------------------------------------+--------------------------------------------------+

The solution was to make the class a separate file and create an __init__.py in the same directory.解决方案是使 class 成为一个单独的文件,并在同一目录中创建一个__init__.py

Then import the class as:然后将 class 导入为:

from ExampleFileName import ExampleModel

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM