简体   繁体   中英

Is there anyway to extract files from an s3 bucket using ls and wildcards? (aws cli)

I have a couple of issues trying to list matching files within a directory in my s3 bucket.

I have been using an ls to create a list of all my files within a specific directory (including timestamps and file sizes) by appending this to a new file, for example aws s3 --profile mybucket1 ls s3://my-bucket1/directory1/directory2/ >> list.txt .

This is working both quickly and as intended. However, I also have some files within directory1 that I want to list too, but only certain files that match a specific string with wildcards around it.

The closest I have got to achieving this is using cp (with --exclude and --include) instead of an ls, because ls does not allow you to use the --exclude or --include options, for example aws s3 --profile mybucket1 cp s3://my-bucket1/directory1/ /path/to/local/directory/ --exclude "*" --include "*match-me*" --recursive but it is taking forever to even copy a single file across (I don't get any stdout as I believe it scans the whole directory looking for the matching string before it copies anything). I'm also unable to remove --recursive as it gives me an error without it, even though I only want to copy files within directory1.

My preference would be to use an ls for this as it's much faster and I don't have to then do another ls on the downloaded files to produce the same format as the files I listed and appended from directory2.

Also, I am running all these commands within a bash script, in case that makes any difference.

I hope this makes sense and if anybody has any possible solutions to this, it would be much appreciated.

You should look at aws s3api list-objects

  1. If your pattern is at the start of the key and you want to match against match-me* , you can use the --prefix arguments :

     aws s3api list-objects --bucket <bucket> --prefix "match-me" 

    will return all keys starting with match-me

  2. If your pattern is at the middle of key name and you want to match against *match-me* , you can run a query against the results:

     aws s3api list-objects --bucket test-bucket-fh --query "Contents[?contains(Key, 'match-me')]" 

If your need to list the contents of Amazon S3 is not urgent (eg do it once per day), then you can use Amazon S3 Storage Inventory :

Amazon S3 inventory provides a comma-separated values (CSV) flat-file output of your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or a shared prefix (that is, objects that have names that begin with a common string).

It provides a daily/weekly file in Amazon S3 containing a list of all objects. Additional charges apply.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM