简体   繁体   中英

PowerShell for AWS: List only "folders" from S3 bucket?

Is there any easy way to use PowerShell to only get a list of "folders" from an S3 bucket, without listing every single object and just scripting a compiled list of distinct paths? There are hundreds of thousands of individual objects in the bucket I'm working in, and that would take a very long time.

It's possible this is a really stupid question and I'm sorry if that's the case, but I couldn't find anything on Google or SO to answer this. I've tried adding wildcards to -KeyPrefix and -Key params of Get-S3Object to no avail. That's the only cmdlet that seems like it might be capable of doing what I'm after.

Pointless backstory: I just want to make sure I'm transferring files to the correct, existing folders. I'm a contracted third party, so I don't have console login access and I'm not the person who maintains the AWS account.

I know this is possible using Java and C# and others, but I'm doing everything else involved with this fairly simple project in PS and was hoping to be able to stick with it.

Thanks in advance.

You can use the AWS Tools For PowerShell to list objects (via Get-S3Object ) in the bucket and pull common prefixes from the response object.

Below is a small library to recursively retrieve subdirectories:

function Get-Subdirectories
{
  param
  (
    [string] $BucketName,
    [string] $KeyPrefix,
    [bool] $Recurse
  )

  @(get-s3object -BucketName $BucketName -KeyPrefix $KeyPrefix -Delimiter '/') | Out-Null

  if($AWSHistory.LastCommand.Responses.Last.CommonPrefixes.Count -eq 0)
  {
    return
  }

  $AWSHistory.LastCommand.Responses.Last.CommonPrefixes

  if($Recurse)
  {
    $AWSHistory.LastCommand.Responses.Last.CommonPrefixes | % { Get-Subdirectories -BucketName $BucketName -KeyPrefix $_ -Recurse $Recurse }
  }
}

function Get-S3Directories
{
  param
  (
    [string] $BucketName,
    [bool] $Recurse = $false
  )

  Get-Subdirectories -BucketName $BucketName -KeyPrefix '/' -Recurse $Recurse
}

This recursive function depends on updating the KeyPrefix on each iteration to check for subdirectories in each KeyPrefix passed to it. By setting the delimiter as '/' , keys matching the KeyPrefix string before hitting the first occurance of the delimiter are rolled into the CommonPrefixes collection in the last response of $AWSHistory.

To retrieve only the top-level directories in an S3 Bucket:

PS C:/> Get-S3Directories -BucketName 'myBucket'

To retrieve all directories in an S3 Bucket:

PS C:/> Get-S3Directories -BucketName 'myBucket' -Recurse $true

This will return a collection of strings, where each string is a common prefix.

Example Output:

myprefix/
myprefix/txt/
myprefix/img/
myotherprefix/
...
$objects = Get-S3Object -BucketName $bucketname -ProfileName $profilename -Region $region
$paths=@()
foreach($object in $objects) 
{
    $path = split-path $object.Key -Parent 
    $paths += $path
}
$paths = $paths | select -Unique
write-host "`nNumber of folders "$paths.count""
Write-host "$([string]::join("`n",$paths)) "

This version of Powershell iterates over 1000 keys in a single S3 Bucket (aws limits only 1000 keys for API get-S3object hence we need a while-loop to get over 1000 keys aka folders) After output generated to csv, remember to sort duplicates in Excel to remove duplicates (PS, anyone can assist to sort duplicates as i think my script not working well with duplicates)

#Main-Code 
$keysPerPage = 1000 #Set max key of AWS limit of 1000
$bucketN = 'testBucket' #Bucketname
$nextMarker = $null 
$output =@()
$Start = "S3 Bucket Name : $bucketN"
$End = "- End of Folder List -"

Do
{
  #Iterate 1000 records per do-while loop, this is to overcome the limitation of only 1000 keys retrieval per get-s3object calls by AWS 
  $batch = get-s3object -BucketName $bucketN -Maxkey $keysPerPage -Marker $nextMarker 

  $batch2 = $batch.key | % {$_.Split('/')[0]} | Sort -Unique 
  $output += $batch2 
  $batch2

  $nextMarker= $AWSHistory.LastServiceResponse.NextMarker
} while ($nextMarker)

   #Output to specific folder in a directory
   $Start | Out-file C:\Output-Result.csv  -Append
   $output | Out-file C:\Output-Result.csv  -Append
   $End | Out-file C:\Output-Result.csv -Append

The accepted answer is correct but with a flaw. If you have a large bucket with many "folders" (over 1000) you will only get the last 1000 prefixes by using:

$AWSHistory.LastCommand.Responses.Last.CommonPrefixes

AWS batches responses in 1000 increments. If you look at

$AWSHistory.LastCommand.Responses.History 

You will see multiple entries. Unfortunately only 5 by default. You can change that behavior by using the Set-AWSHistoryConfiguration function.

To increase the number of History responses use the -MaxServiceCallHistory parameter.

Set-AWSHistoryConfiguration -MaxServiceCallHistory 20

This will store the last 20 service calls for the next (and all subsequent) command.

With the above configuration you could retrieve up to 20000 SubFolders from a folder.

To retrieve all the folders do the following:

$subFolders = ($AwsHistory.LastCommand.Responses.History).CommonPrefixes

Caution: Increasing the configuration parameters will utilize more memory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM