简体   繁体   English

如何将 tensorflow 保存的 model 加载到 memory 中,并且在谷歌应用引擎中部署后无需再次加载?

[英]How to load tensorflow saved model into memory and never has to load it again after deploying in google app engine?

I am using a tensorflow hub model "Universal Sentence Encoder" and sometimes app engine shutdowns automatically and it takes more time to load this model again.我正在使用 tensorflow 集线器 model “通用句子编码器”,有时应用程序引擎会自动关闭,并且再次加载此 model 需要更多时间。 How to make that model stay in memory?如何使 model 留在 memory 中?

   runtime: python
    env: flex
    
    runtime_config:
        python_version: 3
    
    automatic_scaling:
      min_num_instances: 1
      max_num_instances: 1
      cpu_utilization:
        target_utilization: 1
    readiness_check:
      app_start_timeout_sec: 1800
    liveness_check:
      path: "/liveness_check"
      check_interval_sec: 30
    resources:
        cpu: 1
        memory_gb: 6
        disk_size_gb: 15                        
    entrypoint: gunicorn -k uvicorn.workers.UvicornWorker -w 4 app.main:app --timeout 1000

Logs:日志:

A 2021-03-22T10:53:59Z [2021-03-22 10:53:59 +0000] [8] [INFO] Started server process [8]
 
A 2021-03-22T10:53:59Z [2021-03-22 10:53:59 +0000] [8] [INFO] Waiting for application startup.
 
A 2021-03-22T10:53:59Z [2021-03-22 10:53:59 +0000] [8] [INFO] Application startup complete.
 
A 2021-03-22T11:01:03Z [2021-03-22 11:01:03 +0000] [1] [INFO] Handling signal: term
 
A 2021-03-22T11:01:03Z [2021-03-22 11:01:03 +0000] [8] [INFO] Shutting down
 
A 2021-03-22T11:01:03Z [2021-03-22 11:01:03 +0000] [8] [INFO] Error while closing socket [Errno 9] Bad file descriptor
 
A 2021-03-22T11:01:04Z [2021-03-22 11:01:04 +0000] [8] [INFO] Waiting for application shutdown.
 
A 2021-03-22T11:01:04Z [2021-03-22 11:01:04 +0000] [8] [INFO] Application shutdown complete.
 
A 2021-03-22T11:01:04Z [2021-03-22 11:01:04 +0000] [8] [INFO] Finished server process [8]
 
A 2021-03-22T11:01:04Z [2021-03-22 11:01:04 +0000] [8] [INFO] Worker exiting (pid: 8)
 

With App Engine flex your application is started when deployed (because the min instance is 1, you always have 1 instance started).使用 App Engine flex,您的应用程序在部署时启动(因为最小实例为 1,所以您始终启动 1 个实例)。 In your startup routine, add the model loading in memory and keep it in global variable.在您的启动例程中,在 memory 中添加 model 加载并将其保存在全局变量中。 Like this, any requests that you receive on your service will be able to use it.像这样,您在服务上收到的任何请求都可以使用它。

However some caveats:但是有一些注意事项:

  • App Engine flex is restarted at least once a week to update the underlying platform. App Engine flex 至少每周重新启动一次以更新底层平台。 So, once a week, at least, your instance will restart.因此,至少每周一次,您的实例将重新启动。 But because you load your modal at startup, no worries about the request response time!但是因为你在启动时加载了你的模式,所以不用担心请求响应时间!
  • When the service scale up (not your case), a new instance is created and thus the model is again loaded in memory当服务扩大(不是您的情况)时,会创建一个新实例,因此 model 再次加载到 memory
  • Similarly, if your instance crash (out of memory for example, or unhandled exception), a new one is created and thus the model is loaded again at this time.同样,如果您的实例崩溃(例如 memory 或未处理的异常),则会创建一个新实例,因此此时会再次加载 model。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM