简体   繁体   中英

List files changed after a particular timestamp in Google Cloud Storage Bucket

I want to list the files that have changed/added in a Google Cloud Storage Bucket after a particular timestamp in node.js. I was going through documentation https://cloud.google.com/storage/docs/how-to but did not find any mechanism for it.

const {Storage} = require('@google-cloud/storage');
const storage = new Storage();
const bucketName = 'my-bucket';
const [files] = await storage.bucket(bucketName).getFiles();

How can files be listed which were added after a timestamp in a bucket?

This doesn't appear to be possible with the given list API . The documentation doesn't say anything about filtering objects with a date.

It's common to store data about all uploaded files in a database, which is easier to query for whatever properties of the file you would like to store. You can even use a Cloud Functions trigger to automatically write a record into the database for every file upload.

There is no direct function to do so, as Doug Stevenson pointed out, however you can actually know when the file was last modified by searching on its metadata. For example, see this code snippet:

const {Storage} = require('@google-cloud/storage');
const storage = new Storage();
const bucketName = 'your-bucket-name';


storage.bucket(bucketName).getFiles(function(err, files) {
  if (!err) {
    // files is an array of File objects.
         files.forEach(function (file){
                file.getMetadata(function (err, metadata){
                        // TODO: Save only the files before certain metadata.updated date
                        console.log("File named " + metadata.name +
                                    "last updated on: " + metadata.updated);
                });
         });
  }
});

Then it's up to you to create a condition inside the getMetadata callback function to only list/keep files after a certain metadata.update date threshold.

There's no direct API support for this. However, if you're going to need to do this query frequently, you could manually construct a workflow to keep an index. It would consist of a small application storing an index subscribed to notifications about changes to the bucket, and it'd have an API method that would retrieve objects sorted by date.

Python version with prefix

import argparse

from google.cloud import storage


def list_blobs_with_prefix(bucket_name, prefix,tsAfter):
    storage_client = storage.Client()

    blobs = storage_client.list_blobs(bucket_name, prefix=prefix, delimiter=None)

    for blob in blobs:
        if blob.updated.timestamp() > int(tsAfter):
        print(blob.name, blob.updated)


def main(bucket, prefix, tsAfter):
    list_blobs_with_prefix(bucket, prefix,tsAfter )


if __name__ == '__main__':
    parser =argparse.ArgumentParser(description=__doc__,formatter_class=argparse.RawDescriptionHelpFormatter)
    parser.add_argument('bucket', help='Your Cloud Storage bucket.')
    parser.add_argument('prefix', help='Prefix to match')
    parser.add_argument('tsAfter', help='Timestamp After which you want to list the files.')

    args = parser.parse_args()

    main(args.bucket, args.prefix, args.tsAfter)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM