Google Cloud Run - Executing background job with always-allocated CPU

Question

I have a web app and a background worker service running in Cloud Run.
The main app calls the background worker which is essentially just an rq worker wrapped in a thin Flask app to adhere to the runtime contract. The rq worker is spawned via subprocess.Popen .
I do not block the main thread with the Popen call, and return a response immediately. However, the instance still seems to die after 15 minutes of processing.

Per the documentation, it appears this workflow should be supported so long as there is some sort of CPU processing going on (it isn't exactly clear):

If you want to support background activities in your Cloud Run service, set your Cloud Run service CPU to be always allocated so you can run background activities outside of requests and still have CPU access.

Another article says:

Note that even if CPU is always allocated, Cloud Run autoscaling is still in effect, and may terminate container instances if they aren't needed to handle incoming traffic. An instance will never stay idle for more than 15 minutes after processing a request unless it is kept active using minimum instances.

This 15-minute limit seems to be what I'm encountering despite the CPU certainly not being "idle" in any sense of the word.

The particular background jobs I am spawning could potentially take 1 - 2 hours in some extreme cases, so blocking the main thread, not returning a response until completion, and increasing the request timeout would not work as maxes out at 1 hour (not to mention it's prone to error).

Is there a way to make this work without moving toward GKE or hacky Cloud Build workarounds?

EDIT - Some additional details

Worker service configuration:

CPU is always allocated
Maximum requests per container = 1
Min instances 0
Max instances 10

Here is the server that spawns the rq worker:

import os
import subprocess

from flask import Flask, Response
from http import HTTPStatus

app = Flask(__name__)

@app.route("/")
def index():
    subprocess.Popen(["rq", "worker", "--burst", "--url", os.getenv("REDIS_URL"), "queue"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)

    return Response(status=HTTPStatus.OK)

The Dockerfile for which just runs the following command after setting things up:

gunicorn -w 1 --timeout 0 -b 0.0.0.0:8080 app:app

The logs do not yield anything particularly useful because I never use communicate() or check the output of the Popen call to avoid blocking the main thread. I am left with just the gunicorn logs as a result, which isn't ideal:

Answer 1

Using gunicorn --timeout 0 and blocking the thread to avoid sending an HTTP response works for up to 60 minutes of processing time (ie the highest request timeout allowed by Cloud Run). You must configure this on Cloud Run as the default is 5 minutes.

This is not an ideal solution, though:

The longer the timeout is, the more likely the connection can be lost due to failures on the client side or the Cloud Run side. When a client re-connects, a new request is initiated and the client isn't guaranteed to connect to the same container instance of the service.

https://cloud.google.com/run/docs/configuring/request-timeout

Otherwise, GKE or Compute Engine work for this type of workload.

Google Cloud Run - Executing background job with always-allocated CPU

Question

1 answers

solution1
2 ACCPTED

Google Cloud Run - Executing background job with always-allocated CPU

Question

1 answers

solution1 2 ACCPTED

solution1
2 ACCPTED