简体   繁体   English

pickle/joblib AttributeError: 模块 '__main__' 在 pytest 中没有属性 'thing'

[英]pickle/joblib AttributeError: module '__main__' has no attribute 'thing' in pytest

I have built a custom sklearn pipeline, as follows:我已经构建了一个自定义的sklearn管道,如下:

pipeline = make_pipeline(
    SelectColumnsTransfomer(features_to_use),
    ToDummiesTransformer('feature_0', prefix='feat_0', drop_first=True,  dtype=bool), # Dummify customer_type
    ToDummiesTransformer('feature_1', prefix='feat_1'), # Dummify the feature
    ToDummiesTransformer('feature_2', prefix='feat_2'), # Dummify 
    ToDummiesTransformer('feature_3', prefix='feat_3'), # Dummify
)
pipeline.fit(df)

The classes SelectColumnsTransfomer and ToDummiesTransformer are custom sklearn steps implementing BaseEstimator and TransformerMixin . SelectColumnsTransfomerToDummiesTransformer类是实现BaseEstimatorTransformerMixin自定义 sklearn 步骤。 To serialise this object I use为了序列化这个对象,我使用

from sklearn.externals import joblib
joblib.dump(pipeline, 'data_pipeline.joblib')

but when I do deserialise with但是当我反序列化时

pipeline = joblib.load('data_pipeline.joblib') 

I get AttributeError: module '__main__' has no attribute 'SelectColumnsTransfomer' .我得到AttributeError: module '__main__' has no attribute 'SelectColumnsTransfomer'

I have read other similar questions and followed the instruction in this blogpost here , but couldn't solve the issue.我看过其他类似的问题,并遵循这一博客帖子的指令在这里,但未能解决问题。 I am copying pasting the classes, and importing them in the code.我正在复制粘贴这些类,并将它们导入到代码中。 If i create a simplified version of this exercise, the whole thing works, the problem occurs because i am running some tests with pytest, and when i run pytest it seems it doesn't see my custom classes, in fact there is this other part of the error self = <sklearn.externals.joblib.numpy_pickle.NumpyUnpickler object at 0x7f821508a588>, module = '__main__', name = 'SelectColumnsTransfomer' which is hinting me that the NumpyUnpickler doesn't see the SelectColumnsTransfomer even if in the test it is imported.如果我创建此练习的简化版本,则整个过程都有效,问题发生是因为我正在使用 pytest 运行一些测试,而当我运行 pytest 时,它似乎没有看到我的自定义类,实际上还有其他部分错误self = <sklearn.externals.joblib.numpy_pickle.NumpyUnpickler object at 0x7f821508a588>, module = '__main__', name = 'SelectColumnsTransfomer'这暗示我即使在测试中NumpyUnpickler也看不到SelectColumnsTransfomer是进口的。

My test code我的测试代码

import pytest
from app.pipeline import * # the pipeline objects 
                          # SelectColumnsTransfomer and ToDummiesTransformer 
                          # are here!


@pytest.fixture(scope="module")
def clf():
    pipeline = joblib.load("persistence/data_pipeline.joblib")
    return clf

def test_fake(clf):
    assert True

OK I found out the problem.好的,我发现了问题。 I discovered that the problem has nothing to do with the issue explained in the blogpost here Python: pickling and dealing with "AttributeError: 'module' object has no attribute 'Thing'" as I originally thought.我发现这个问题与博客文章中解释的问题无关Python:pickling and processing "AttributeError: 'module' object has no attribute 'Thing'"正如我最初所想的那样。 You can easily solve the problem by having your object pickling and unpickling the file.您可以通过让对象酸洗和取消酸洗文件来轻松解决问题。 I was using a separate script (a Jupyther notebook) to pickle and a plain [python script to unpicle.我使用一个单独的脚本(一个 Jupyther 笔记本)和一个普通的 [python 脚本来 unpicle。 When I did everything in the same class it worked.当我在同一堂课上做所有事情时,它就奏效了。

I had the same error message when I was trying to save a Pytorch class like this:当我尝试保存这样的 Pytorch 类时,我收到了相同的错误消息:

import torch.nn as nn

class custom(nn.Module):
    def __init__(self):
        super(custom, self).__init__()
        print("Class loaded")

model = custom()

And then using Joblib to dump this model like so:然后使用 Joblib 像这样转储这个模型:

from joblib import dump
dump(model, 'some_filepath.jobjib')

The issue was I was running the code above in a Kaggle kernel.问题是我在Kaggle内核中运行上面的代码。 And then downloading the dumped file and trying to load it with this script locally:然后下载转储文件并尝试在本地使用此脚本加载它:

from joblib import load
model = load(model, 'some_filepath.jobjib')

The way I fixed the issue was to run all of these code snippets locally on my computer instead of creating the class and dumping it on Kaggle, but loading it on my local machine .我解决问题的方法是在我的计算机上本地运行所有这些代码片段,而不是创建类并将其转储到 Kaggle 上,而是将其加载到我的本地机器上 Wanted to add this here because the comments on the answer by @DarioB confused me in their reference to a 'function' which didn't apply in my simpler case.想在这里添加这个是因为@DarioB 对答案的评论让我感到困惑,因为他们提到了一个不适用于我的简单案例的“函数”。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 AttributeError: 模块 '__main__' 没有属性 'AverageWordLengthExtractor' - AttributeError: module '__main__' has no attribute 'AverageWordLengthExtractor' AttributeError:模块'__main__'没有属性'cleaner' - AttributeError: module '__main__' has no attribute 'cleaner' Pickle 序列化:模块 &#39;__main__&#39; 没有属性 &#39;tokenize&#39; - Pickle serialization: module '__main__' has no attribute 'tokenize' unittest:AttributeError:模块'__main__'没有属性'C:\ ...' - unittest: AttributeError: module '__main__' has no attribute 'C:\…' TensorFlow:模块&#39;__main__&#39;没有属性&#39;main&#39; - TensorFlow: module '__main__' has no attribute 'main' joblib.load __main__ AttributeError - joblib.load __main__ AttributeError 查找“pip”的模块规范时出错(AttributeError:模块“__main__”没有属性“__file__”) - Error while finding module specification for 'pip' (AttributeError: module '__main__' has no attribute '__file__') Pydoop mapreduce“ AttributeError:模块&#39;wordcount_minimal&#39;没有属性&#39;__main__&#39;” - Pydoop mapreduce “AttributeError: module 'wordcount_minimal' has no attribute '__main__'” 用于单元测试的python3:AttributeError:模块&#39;__main__&#39;没有属性“内核......” - python3 for unit test: AttributeError: module '__main__' has no attribute “kernel…” Python多处理错误:AttributeError:模块&#39;__main__&#39;没有属性&#39;__spec__&#39; - Python Multiprocessing error: AttributeError: module '__main__' has no attribute '__spec__'
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM