简体   繁体   中英

How to manually copy executable to workers with Apache Beam Dataflow on GCP

Somewhat new to Beam and GCP. Following this document and using the Beam 'subprocess' examples I've been working on a simple Java Pipeline that runs a C binary. It runs fine when using the directRunner and I'm now trying to get it to run in the cloud. With the file staged in a gs buckets, I get the error: 'Cannot run program "gs://mybucketname/tmp/grid_working_files/Echo": error=2, No such file or directory' which makes sense since I guess you can't execute directly out of cloud storage? Where I'm stuck now is how to move the executable to the worker. The document states:

When you use a native Apache Beam language (Java or Python), the Beam SDK automatically moves all required code to the workers. However, when you make a call to external code, you need to move the code manually.  To move the code, you do the following:

  1. Store the compiled external code, along with versioning information, in Cloud Storage.
  2. In the @Setup method, create a synchronized block to check whether the code file is available on the local resource. Rather than implementing a physical check, you can confirm availability using a static variable when the first thread finishes.
  3. If the file isn't available, use the Cloud Storage client library to pull the file from the Cloud Storage bucket to the local worker. A recommended approach is to use the Beam FileSystems class for this task.
  4. After the file is moved, confirm that the execute bit is set on the code file.
  5. In a production system, check the hash of the binaries to ensure that the file has been copied correctly.

I've looked at the FileSystems class, and I think I understand it, but what I don't know is where I need to copy the files to. Is there a known directory or filepath that the workers use? I'm using the Dataflow runner.

You can copy the file to wherever you want in your workers local filesystem, eg you could use the tempfile module to create a new, empty temporary directory in which to copy your executable before running.

Using custom containers might be a good solution to this as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM