简体   繁体   中英

Stream file from Google Cloud Storage

Here is a code to download File from Google Cloud Storage:

@Override
public void write(OutputStream outputStream) throws IOException {
    try {
        LOG.info(path);
        InputStream stream = new ByteArrayInputStream(GoogleJsonKey.JSON_KEY.getBytes(StandardCharsets.UTF_8));
        StorageOptions options = StorageOptions.newBuilder()
                .setProjectId(PROJECT_ID)
                .setCredentials(GoogleCredentials.fromStream(stream)).build();
        Storage storage = options.getService();
        final CountingOutputStream countingOutputStream = new CountingOutputStream(outputStream);
        byte[] read = storage.readAllBytes(BlobId.of(BUCKET, path));
        countingOutputStream.write(read);
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        outputStream.close();
    }
}

This works but the problem here is that it has to buffer all the bytes first before it streams back to the client of this method. This is causing a lot of delays especially when the file stored in the GCS is big.

Is there a way to get the File from GCS and stream it directly to the OutputStream , this OutputStream here btw is for a Servlet.

Just to clarify, do you need an OutputStream or an InputStream ? One way to look at this is that the data stored in Google Cloud Storage object as a file and you having an InputStream to read that file. If that works, read on.

There is no existing method in Storage API which provides an InputStream or an OutputStream . But the there are 2 APIs in the Cloud Storage client library which expose a ReadChannel object which is extended from ReadableByteChannel (from java NIO API).

ReadChannel reader(String bucket, String blob, BlobSourceOption... options);
ReadChannel reader(BlobId blob, BlobSourceOption... options);

A simple example using this (taken from StorageSnippets.java ):

/**
   * Example of reading a blob's content through a reader.
   */
  // [TARGET reader(String, String, BlobSourceOption...)]
  // [VARIABLE "my_unique_bucket"]
  // [VARIABLE "my_blob_name"]
  public void readerFromStrings(String bucketName, String blobName) throws IOException {
    // [START readerFromStrings]
    try (ReadChannel reader = storage.reader(bucketName, blobName)) {
      ByteBuffer bytes = ByteBuffer.allocate(64 * 1024);
      while (reader.read(bytes) > 0) {
        bytes.flip();
        // do something with bytes
        bytes.clear();
      }
    }
    // [END readerFromStrings]
  }

You can also use the newInputStream() method to wrap an InputStream over the ReadableByteChannel .

public static InputStream newInputStream(ReadableByteChannel ch)

Even if you need an OutputStream , you should be able to copy data from the InputStream or better from the ReadChannel object into the OutputStream .

Complete example

Run this example as: PROGRAM_NAME <BUCKET_NAME> <BLOB_PATH>

import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.Channels;
import java.nio.channels.WritableByteChannel;

import com.google.cloud.ReadChannel;
import com.google.cloud.storage.Bucket;
import com.google.cloud.storage.BucketInfo;
import com.google.cloud.storage.Storage;
import com.google.cloud.storage.StorageOptions;

/**
 * An example which reads the contents of the specified object/blob from GCS
 * and prints the contents to STDOUT.
 *
 * Run it as PROGRAM_NAME <BUCKET_NAME> <BLOB_PATH>
 */
public class ReadObjectSample {
  private static final int BUFFER_SIZE = 64 * 1024;

  public static void main(String[] args) throws IOException {
    // Instantiates a Storage client
    Storage storage = StorageOptions.getDefaultInstance().getService();

    // The name for the GCS bucket
    String bucketName = args[0];
    // The path of the blob (i.e. GCS object) within the GCS bucket.
    String blobPath = args[1];

    printBlob(storage, bucketName, blobPath);
  }

  // Reads from the specified blob present in the GCS bucket and prints the contents to STDOUT.
  private static void printBlob(Storage storage, String bucketName, String blobPath) throws IOException {
    try (ReadChannel reader = storage.reader(bucketName, blobPath)) {
      WritableByteChannel outChannel = Channels.newChannel(System.out);
      ByteBuffer bytes = ByteBuffer.allocate(BUFFER_SIZE);
      while (reader.read(bytes) > 0) {
        bytes.flip();
        outChannel.write(bytes);
        bytes.clear();
      }
    }
  }
}

Currently the cleanest option I could find looks like this:

Blob blob = bucket.get("some-file");
ReadChannel reader = blob.reader();
InputStream inputStream = Channels.newInputStream(reader);

The Channels is from java.nio. Furthermore you can then use commons io to easily read to InputStream into an OutputStream:

IOUtils.copy(inputStream, outputStream);

Folks should be using Java 9 or above by now and so can use InputStream transferTo the output stream:


    // the resource url is something like gs://youbucket/some/file/path.csv
    public InputStream getUriAsInputStream( Storage storage, String resourceUri) {
        String[] parts = resourceUri.split("/");
        BlobId blobId = BlobId.of(parts[2], String.join("/", Arrays.copyOfRange(parts, 3, parts.length)));
        Blob blob = storage.get(blobId);
        if (blob == null || !blob.exists()) {
            throw new IllegalArgumentException("Blob [" + resourceUri + "] does not exist");
        }
        ReadChannel reader = blob.reader();
        InputStream inputStream = Channels.newInputStream(reader);
        return inputStream;
    }

// use it with something like: 
@Override
public void write(OutputStream outputStream) throws IOException {
    try {
        LOG.info(path);
        InputStream stream = new ByteArrayInputStream(GoogleJsonKey.JSON_KEY.getBytes(StandardCharsets.UTF_8));
        StorageOptions options = StorageOptions.newBuilder()
                .setProjectId(PROJECT_ID)
                .setCredentials(GoogleCredentials.fromStream(stream)).build();
        Storage storage = options.getService();
        final CountingOutputStream countingOutputStream = new CountingOutputStream(outputStream);
        
        final InputStream in = getUriAsInputStream(storage, "gs://your-bucket/path/to/file.csv");
        in.transferTo(outputStream)
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        outputStream.close();
        in.close();
    }
}

Code, based on @Tuxdude answer

 @Nullable
    public byte[] getFileBytes(String gcsUri) throws IOException {

        Blob blob = getBlob(gcsUri);
        ReadChannel reader;
        byte[] result = null;
        if (blob != null) {
            reader = blob.reader();
            InputStream inputStream = Channels.newInputStream(reader);
           result = IOUtils.toByteArray(inputStream);
        }
        return result;
    }

or

//this will work only with files 64 * 1024 bytes on smaller
 @Nullable
    public byte[] getFileBytes(String gcsUri) throws IOException {
        Blob blob = getBlob(gcsUri);

        ReadChannel reader;
        byte[] result = null;
        if (blob != null) {
            reader = blob.reader();
            ByteBuffer bytes = ByteBuffer.allocate(64 * 1024);

            while (reader.read(bytes) > 0) {
                bytes.flip();
                result = bytes.array();
                bytes.clear();
            }
        }
        return result; 
    }

helper code:

   @Nullable
    Blob getBlob(String gcsUri) {
        //gcsUri is "gs://" + blob.getBucket() + "/" + blob.getName(),
        //example "gs://myapp.appspot.com/ocr_request_images/000c121b-357d-4ac0-a3f2-24e0f6d5cea185dffb40eee-850fab211438.jpg"

        String bucketName = parseGcsUriForBucketName(gcsUri);
        String fileName = parseGcsUriForFilename(gcsUri);

        if (bucketName != null && fileName != null) {
            return storage.get(BlobId.of(bucketName, fileName));
        } else {
            return null;
        }
    }

    @Nullable
    String parseGcsUriForFilename(String gcsUri) {
        String fileName = null;
        String prefix = "gs://";
        if (gcsUri.startsWith(prefix)) {
            int startIndexForBucket = gcsUri.indexOf(prefix) + prefix.length() + 1;
            int startIndex = gcsUri.indexOf("/", startIndexForBucket) + 1;
            fileName = gcsUri.substring(startIndex);
        }
        return fileName;
    }

    @Nullable
    String parseGcsUriForBucketName(String gcsUri) {
        String bucketName = null;
        String prefix = "gs://";
        if (gcsUri.startsWith(prefix)) {
            int startIndex = gcsUri.indexOf(prefix) + prefix.length();
            int endIndex = gcsUri.indexOf("/", startIndex);
            bucketName = gcsUri.substring(startIndex, endIndex);
        }
        return bucketName;
    }

Another (convenient) way to stream a file from Google Cloud Storage, with google-cloud-nio :

Path path = Paths.get(URI.create("gs://bucket/file.csv"));
InputStream in = Files.newInputStream(path);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM