![](/img/trans.png)
[英]AttributeError: Can't get attribute 'tokenizer' on <module '__main__'>
[英]AttributeError: Can't get attribute 'InsertNews' on <module '__main__'
我正在尝试编写一个程序来抓取网站内容。 该脚本似乎运行了一段时间,但在几次迭代后停止
Traceback (most recent call last):
File "D:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\multiprocessing\util.py", line 300, in _run_finalizers
finalizer()
File "D:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\multiprocessing\util.py", line 224, in __call__
res = self._callback(*self._args, **self._kwargs)
File "D:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\multiprocessing\pool.py", line 581, in _terminate_pool
cls._help_stuff_finish(inqueue, task_handler, len(pool))
File "D:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\multiprocessing\pool.py", line 568, in _help_stuff_finish
inqueue._reader.recv()
File "D:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\multiprocessing\connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
AttributeError: Can't get attribute 'InsertNews' on <module '__main__' from 'c:\\program files (x86)\\microsoft visual studio\\2019\\common7\\ide\\extensions\\microsoft\\python\\core\\debugpy\\__main__.py'>
这是我要运行的脚本
from boilerpy3 import extractors
import pymongo
import multiprocessing as mp
def InsertNews(newsite, symbol):
print(symbol)
print(newsite)
extractor = extractors.ArticleExtractor()
try:
content = extractor.get_content_from_url(newsite)
except Exception:
pass
print(content)
record={symbol,content}
mydb["StocksPressRelease"].insert_one(record)
if __name__ == "__main__":
print("started")
pool = mp.Pool(mp.cpu_count())
myclient = pymongo.MongoClient("mongodb+srv://un:pwd@cluster0.subkd.azure.mongodb.net/db?retryWrites=true&w=majority&connectTimeoutMS=900000")
mydb = myclient["db"]
mycol = mydb["Stocks"]
for x in mycol.find({},{"_id": 0, "symbol":1, "newsite": 1 }):
results = pool.apply_async(InsertNews,args=(x["newsite"],x["symbol"]))
pool.close()
从我在这篇文章中读到的内容来看,多处理池对于未在导入模块中定义的对象无法正常工作。 您可以尝试在单独的模块中编写InsertNews function,然后将其导入。
文件: news.py
from boilerpy3 import extractors
def InsertNews(newsite, symbol):
print(symbol)
print(newsite)
extractor = extractors.ArticleExtractor()
try:
content = extractor.get_content_from_url(newsite)
except Exception:
pass
print(content)
文件: main.py
import pymongo
import multiprocessing as mp
import news
if __name__ == "__main__":
print("started")
pool = mp.Pool(mp.cpu_count())
myclient = pymongo.MongoClient("mongodb+srv://un:pwd@cluster0.subkd.azure.mongodb.net/db?retryWrites=true&w=majority&connectTimeoutMS=900000")
mydb = myclient["db"]
mycol = mydb["Stocks"]
for x in mycol.find({},{"_id": 0, "symbol":1, "newsite": 1 }):
results = pool.apply_async(news.InsertNews,args=(x["newsite"],x["symbol"]))
pool.close()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.