By "Google Batch" I'm referring to the new service Google launched about a month or so ago.
https://cloud.google.com/batch
I have a Python script which takes a few minutes to execute at the moment. However with the data it will soon be processing in the next few months this execution time will go from minutes to hours . This is why I am not using Cloud Function or Cloud Run to run this script, both of these have a max 60 minute execution time.
Google Batch came about recently and I wanted to explore this as a possible method to achieve what I'm looking for without just using Compute Engine.
However documentation is sparse across the inte.net and I can't find a method to "trigger" an already created Batch job by using Cloud Scheduler. I've already successfully manually created a batch job which runs my docker image. Now I need something to trigger this batch job 1x a day, thats it . It would be wonderful if Cloud Scheduler could serve this purpose.
I've seen 1 article describing using GCP Workflow to create aa new Batch job on a cron determined by Cloud Scheduler. Issue with this is its creating a new batch job every time, not simply re-running the already existing one. To be honest I can't even re-run an already executed batch job on the GCP website itself so I don't know if its even possible.
https://www.intertec.io/resource/python-script-on-gcp-batch
Lastly, I've even explored the official Google Batch Python library and could not find anywhere in there some built in function which allows me to "call" a previously created batch job and just re-run it.
There is a misunderstanding. When you use Cloud Run jobs, you create a configuration and you execute a configuration.
BUT, with Batch job, you execute a configuration. That's all, no configuration to create in advance.
Have a look to the APIs : Create, Get, Delete. No more.
Therefore, you have to set in your Cloud Scheduler, the whole Batch configuration to create a new job. Take care to NOT set the jobID in the query parameter.
I wrote this for you this morning as a guide.
It uses Google's example in combination with Cloud Scheduler:
# Used to correctly (!?) form Batch Job
import google.cloud.batch_v1.types
import google.cloud.scheduler_v1
import google.cloud.scheduler_v1.types
import os
project = os.getenv("PROJECT")
number = os.getenv("NUMBER")
location = os.getenv("LOCATION")
job = os.getenv("JOB")
# Batch Job
# Create Batch Job using batch_v1.types
# Alternatively, create this from scratch
batch_job = google.cloud.batch_v1.types.Job(
priority=0,
task_groups=[
google.cloud.batch_v1.types.TaskGroup(
task_spec=google.cloud.batch_v1.types.TaskSpec(
runnables=[
google.cloud.batch_v1.types.Runnable(
container=google.cloud.batch_v1.types.Runnable.Container(
image_uri="gcr.io/google-containers/busybox",
entrypoint="/bin/sh",
commands=[
"-c",
"echo \"Hello world! This is task ${BATCH_TASK_INDEX}. This job has a total of ${BATCH_TASK_COUNT} tasks.\""
],
),
),
],
compute_resource=google.cloud.batch_v1.types.ComputeResource(
cpu_milli=2000,
memory_mib=16,
)
),
task_count=1,
parallelism=1,
),
],
allocation_policy=google.cloud.batch_v1.types.AllocationPolicy(
location=google.cloud.batch_v1.types.AllocationPolicy.LocationPolicy(
allowed_locations=[
f"regions/{location}",
],
),
instances=[
google.cloud.batch_v1.types.AllocationPolicy.InstancePolicyOrTemplate(
policy=google.cloud.batch_v1.types.AllocationPolicy.InstancePolicy(
machine_type="e2-standard-2",
),
),
],
),
labels={
"stackoverflow":"73966292",
},
logs_policy=google.cloud.batch_v1.types.LogsPolicy(
destination=google.cloud.batch_v1.types.LogsPolicy.Destination.CLOUD_LOGGING,
),
)
# Convert the Google Batch Job into JSON
# Google uses Proto Python
# https://proto-plus-python.readthedocs.io/en/stable/messages.html?highlight=JSON#serialization
batch_json=google.cloud.batch_v1.types.Job.to_json(batch_job)
print(batch_json)
# Convert JSON to bytes as required for body by Cloud Scheduler
body=batch_json.encode("utf-8")
# Run hourly on the hour (HH:00)
schedule = "0 * * * *"
parent = f"projects/{project}/locations/{location}"
name = f"{parent}/jobs/{job}"
uri = f"https://batch.googleapis.com/v1/{parent}/jobs?job_id={job}"
service_account_email = f"{number}-compute@developer.gserviceaccount.com"
scheduler_job = google.cloud.scheduler_v1.types.Job(
name=name,
description="description",
http_target=google.cloud.scheduler_v1.types.HttpTarget(
uri=uri,
http_method=google.cloud.scheduler_v1.types.HttpMethod(
google.cloud.scheduler_v1.types.HttpMethod.POST,
),
oauth_token=google.cloud.scheduler_v1.types.OAuthToken(
service_account_email=service_account_email,
),
body=body,
),
schedule=schedule,
)
scheduler_json=google.cloud.scheduler_v1.Job.to_json(scheduler_job)
print(scheduler_job)
request = google.cloud.scheduler_v1.CreateJobRequest(
parent=parent,
job=scheduler_job,
)
scheduler_client = google.cloud.scheduler_v1.CloudSchedulerClient()
print(
scheduler_client.create_job(
request=request
)
)
You can test using:
BILLING="..."
PROJECT="..."
LOCATION="..." # E.g. us-west1
JOB="tester"
ACCOUNT="tester"
EMAIL="${ACCOUNT}@${PROJECT}.iam.gserviceaccount.com"
# Create Project and enable Billing
gcloud projects create ${PROJECT}
gcloud beta billing projects link ${PROJECT} \
--billing-account=${BILLING}
# Enable Cloud Scheduler and Cloud Run
SERVICES=(
"batch"
"cloudscheduler"
"compute"
)
for SERVICE in ${SERVICES[@]}
do
gcloud services enable ${SERVICE}.googleapis.com \
--project=${PROJECT}
done
# Create Service Account
gcloud iam service-accounts create ${ACCOUNT} \
--project=${PROJECT}
gcloud iam service-accounts keys create ${PWD}/${ACCOUNT}.json \
--iam-account=${EMAIL} \
--project=${PROJECT}
# IAM
# https://cloud.google.com/iam/docs/understanding-roles#cloud-scheduler-roles
ROLES=(
"roles/batch.jobsEditor"
"roles/cloudscheduler.admin"
)
for ROLE in ${ROLES[@]}
do
gcloud projects add-iam-policy-binding ${PROJECT} \
--member=serviceAccount:${EMAIL} \
--role=${ROLE}
done
# ActAs
NUMBER=$(\
gcloud projects describe ${PROJECT} \
--format="value(projectNumber)")
COMPUTE_ENGINE="${NUMBER}-compute@developer.gserviceaccount.com"
gcloud iam service-accounts add-iam-policy-binding ${COMPUTE_ENGINE} \
--member=serviceAccount:${EMAIL} \
--role="roles/iam.serviceAccountUser" \
--project=${PROJECT}
Then:
python3 -m venv venv
source venv/bin/activate
# Or requirements.txt
python3 -m pip install google-cloud-batch
python3 -m pip install google-cloud-scheduler
export JOB
export LOCATION
export NUMBER
export PROJECT
export GOOGLE_APPLICATION_CREDENTIALS=${PWD}/${ACCOUNT}.json
python3 main.py
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.