简体   繁体   English

Gunicorn / flask API暴露sklearn模型不起作用

[英]Gunicorn/flask API to expose an sklearn model not working

I can't seem to figure this out. 我似乎无法弄清楚这一点。 I've got a model trained with scikit-learn, saved to a .pkl file, and I want to make an API that would make predictions based on it. 我有一个使用scikit-learn训练的模型,保存到.pkl文件,我想制作一个基于它的预测API。

I already have the code that makes predictions and it runs fine from console/unit-tests. 我已经有了进行预测的代码,它可以从控制台/单元测试中运行良好。 To speed up predictions I'm splitting the data (thousands of image patches) and spreading the load using joblib / multiprocessing . 为了加快预测,我正在分割数据(数千个图像补丁)并使用joblib / multiprocessing扩展负载。

I'm setting JOBLIB_START_METHOD=forkserver since scikit-learn hangs if used from within a multiprocessing process. 我正在设置JOBLIB_START_METHOD=forkserver因为如果在multiprocessing处理过程中使用scikit-learn挂起。

I've got an API done with flask which uses this code, and when run with flask's dev server it works just fine. 我已经使用这个代码完成了使用flask的API,并且当使用flask的dev服务器运行时,它工作得很好。 Now I'm trying to host the flask app within gunicorn and it's not working at all. 现在我正试图在gunicorn flask托管flask应用程序,它根本不起作用。

If I use the default workers, then it just hangs with no errors when trying to predict, much like if I hadn't set the 'forkserver' multiprocessing. 如果我使用默认工作程序,那么它在尝试预测时只会挂起而没有错误,就像我没有设置'forkserver'多处理一样。 I'm running gunicorn like this: 我正在像这样运行gunicorn

JOBLIB_START_METHOD=forkserver gunicorn -w 2 -b 0.0.0.0:$PORT --timeout 3600 web.app:app

I also tried using the gevent backend. 我也尝试过使用gevent后端。 This actually does work but it's very slow, and it prints this: 这实际上确实有效,但它很慢,它打印出来:

Multiprocessing backed parallel loops cannot be nested below threads, setting n_jobs=1

So, any ideas on getting this to work in a way that there's multiple web workers running (I don't think that's the case with flask's dev server) and with a request being able to leverage joblib / multiprocessing ? 那么,任何关于让多个网络工作者运行的方法(我不认为这是烧瓶的开发服务器的情况)和一个能够利用joblib / multiprocessing的请求的joblib thanks 谢谢

Gevent won't work with joblib since it spawns thread(s) to handle requests concurrently (Refer this discussion ) and that's what your warning actually says. Gevent将无法与joblib一起使用,因为它会生成线程以同时处理请求(请参阅此讨论 ),这就是您的警告实际所说的内容。 Secondly, it's very slow because joblib converts your parallel calls into sequential calls and executes them (Refer to this discussion ). 其次,它非常慢,因为joblib将并行调用转换为顺序调用并执行它们(请参阅此讨论 )。

I did the face the same problem while performing parallelism using joblib. 使用joblib执行并行操作时,我遇到了同样的问题。 Although I didn't use sklearn, I think the following command should work for you as well: 虽然我没有使用sklearn,但我认为以下命令也适用于你:

gunicorn -b 0.0.0.0:$SERVICE_PORT --workers=2 -t $SERVICE_TIMEOUT rest_api:app

If you want to have a look at the complete source code, you can follow it here . 如果您想查看完整的源代码,可以在此处进行操作

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM