简体   繁体   English

提取后使用潜在Dirichlet分配的变换方法时出错

[英]Error when using transform method from Latent Dirichlet Allocation after unpickling

I've trained a Latent Dirichlet Allocation model using sklearn. 我已经使用sklearn训练了潜在的Dirichlet分配模型。 When I unpickle it, I then use countVectorizer to transform the document and then transform this instance using LDA in order to get the topic distribution but I'm getting the following error: 当我释放它时,我随后使用countVectorizer转换文档,然后使用LDA转换此实例以获取主题分布,但出现以下错误:

AttributeError: module '__main__' has no attribute 'tokenize'

Here is my code: 这是我的代码:

lda = joblib.load('lda_good.pkl')
#LDA trained model

tf_vect = joblib.load('tf_vectorizer_.pkl')
#vectorizer


texts = readContent('doc_name.pdf')

new_doc = tf_vect.transform(texts)

print(new_doc)

print(lda.transform(new_doc))

The thing is that the countVectorizer object unipickled is working fine and I can use .transform but when I try to then .transform using LDA's attribute it seems to refer to the tokenize function from the countvectorizer... The function tokenize is defined ahead of the code, but I can't understant what does tokenize has to do with the transform method of Latent dirichlet Allocation. 事实是,unipickled的countVectorizer对象工作正常,我可以使用.transform但是当我尝试使用LDA的属性进行.transform ,似乎是从countvectorizer引用了tokenize函数。代码,但我不能弄清楚令牌化与潜在狄利克雷分配的转换方法有什么关系。 A weird thing is that all this code is working fine in jupyter notebook but not when i run it as a script.. 奇怪的是,所有这些代码在jupyter笔记本中都可以正常工作,但是当我将其作为脚本运行时却不能。

All the code is in a single file. 所有代码都在一个文件中。 The model was trained using jupyter notebook and now I was trying to use the model within a script. 该模型是使用jupyter笔记本进行训练的,现在我正在尝试在脚本中使用该模型。

Here is the traceback: 这是回溯:

Traceback (most recent call last):
File "<string>", line 1, in <module>
Traceback (most recent call last):
File     "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
File "<string>", line 1, in <module>
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
exitcode = _main(fd)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
exitcode = _main(fd)
prepare(preparation_data)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepare
prepare(preparation_data)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_path
Traceback (most recent call last):
Traceback (most recent call last):
_fixup_main_from_path(data['init_main_from_path'])
File "<string>", line 1, in <module>
File "<string>", line 1, in <module>
run_name="__mp_main__")
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_path
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_path
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
run_name="__mp_main__")
pkg_name=pkg_name, script_name=fname)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_path
exitcode = _main(fd)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_code
exitcode = _main(fd)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
pkg_name=pkg_name, script_name=fname)
prepare(preparation_data)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepare
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
_fixup_main_from_path(data['init_main_from_path'])
prepare(preparation_data)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_code
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_path
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepare
exec(code, run_globals)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_code
File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>
run_name="__mp_main__")
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_path
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_path
exec(code, run_globals)
tf_vect = joblib.load('tf_vectorizer_.pkl')
pkg_name=pkg_name, script_name=fname)
File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió    documental\POC\program_POC.py", line 160, in <module>
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site-  packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py",  line 96, in _run_module_code
run_name="__mp_main__")
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_path
tf_vect = joblib.load('tf_vectorizer_.pkl')
obj = unpickler.load()
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in load
File  "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_code
pkg_name=pkg_name, script_name=fname)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_code
exec(code, run_globals)
File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>
obj = unpickler.load()
mod_name, mod_spec, pkg_name, script_name)
dispatch[key[0]](self)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_code
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_global
tf_vect = joblib.load('tf_vectorizer_.pkl')
exec(code, run_globals)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site- packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in load
File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>
dispatch[key[0]](self)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_global
tf_vect = joblib.load('tf_vectorizer_.pkl')
obj = unpickler.load()
klass = self.find_class(module, name)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_class
klass = self.find_class(module, name)
obj = unpickler.load()
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_class
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in load
dispatch[key[0]](self)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_global
return getattr(sys.modules[module], name)
AttributeError: module '__main__' has no attribute 'tokenize'
dispatch[key[0]](self)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_global
klass = self.find_class(module, name)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_class
return getattr(sys.modules[module], name)
AttributeError: module '__main__' has no attribute 'tokenize'
klass = self.find_class(module, name)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_class
return getattr(sys.modules[module], name)
AttributeError: module '__main__' has no attribute 'tokenize'
return getattr(sys.modules[module], name)
AttributeError: module '__main__' has no attribute 'tokenize'
Traceback (most recent call last):
File "<string>", line 1, in <module>
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
exitcode = _main(fd)
exitcode = _main(fd)

It actually continues but I guess this will be enough because it starts some kind of loop. 它实际上仍在继续,但我想这已经足够了,因为它开始了某种循环。

Let me know if further information is needed. 让我知道是否需要进一步的信息。

Thanks in advance 提前致谢

Looking through similar questions on SO it shows there are pickling and unpickling issues. 通过SO上的类似问题 ,可以发现存在酸洗和不酸洗问题。 The code that you use to do joblib.dump is in a different directory from what I assume. 用于执行joblib.dump的代码与我假定的目录不同。 Instead could you put that in the same directory as this program and re-run the pickler and unpickler again? 相反,您可以将其与该程序放在同一目录中,然后重新运行Pickler和UnPickler吗? __main__ is stored for the pickled directory and the unpickler tries to search for it when it runs. __main__将存储在已腌制的目录中,并且当程序运行时,unpickler会尝试搜索它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM