简体   繁体   中英

Google Cloud Storage paginate objects in a bucket (PHP)

I want to iterate over the objects in a bucket. I REALLY need to paginate this - we have 100's of thousands of objects in the bucket. Our bucket looks like:

 bucket/MLS ID/file 1
 bucket/MLS ID/file 2
 bucket/MLS ID/file 3
 ... etc

Simplest version of my code follows. I know the value I'm setting into $params['nextToken'] is wrong, I can't figure out how or where to get the right one. $file_objects is a 'Google\Cloud\Storage\ObjectIterator', right?

// temp: pages of 10, out of a total of 100. I really want pages of 100
// out of all (in my test bucket, I have about 700 objects)
$params = [
    'prefix'      => $mls_id,
    'maxResults'  => 10,
    'resultLimit' => 100,
    'fields'      => 'items/id,items/name,items/updated,nextPageToken',
    'pageToken'   => NULL
];

while ( $file_objects = $bucket->objects($params) )
{
    foreach ( $file_objects as $object )
    {
        print "NAME: {$object->name()}\n";
    }

    // I think that this might need to be encoded somehow?
    // or how do I get the requested nextPageToken???
    $params['pageToken'] = $file_objects->nextResultToken(); 

}

So - I don't understand maxResults vs resultLimit. It would seem that resultLimit would be the total that I want to see from my bucket, and maxResults the size of my page. But maxResults doesn't seem to affect anything, while resultLimit does.

maxResults = 100
resultLimit = 10

produces 10 objects.

maxResults = 10
resultLimit = 100

spits out 100 objects.

maxResults = 10
resultLimit = 0

dumps out all 702 in the bucket, with maxResults having no effect at all. And at no point does "$file_objects->nextResultToken();" give me anything.

What am I missing?

The objects method automatically handles pagination for you. It returns an ObjectIterator object.

The resultLimit parameter limits the total number of objects to return across all pages. The maxResults parameter sets the maximum number to return per page.

If you use a foreach over the ObjectIterator object, it'll iterate through all objects, but note that there are also other methods in ObjectIterator , like iterateByPage .

Ok, I think I got it. I found the documentation far too sparse and misleading. The code I came up with:

$params = [
    'prefix' => <my prefix here>,
    'maxResults' => 100,
    //'resultLimit' => 0,
    'fields' => 'items/id,items/name,items/updated,nextPageToken',
    'pageToken' => NULL
];
// Note: setting 'resultLimit' to 0 does not work, I found the
//   docs misleading. If you want all results, don't set it at all

// Get the first set of objects per those parameters
$object_iterator = $bucket->objects($params);

// in order to get the next_result_token, I had to get the current 
//   object first. If you don't, nextResultToken() always returns 
//   NULL
$current = $object_iterator->current();
$next_result_token = $object_iterator->nextResultToken();

while ($next_result_token)
{
    $object_page_iterator = $object_iterator->iterateByPage();
    foreach ($object_page_iterator->current() as $file_object )
    {
        print " -- {$file_object->name()}\n";
    }

    // here is where you use the page token retrieved earlier - get
    //   a new set of objects
    $params['pageToken'] = $next_result_token;
    $object_iterator = $bucket->objects($params);

    // Once again, get the current object before trying to get the
    // next result token
    $current = $object_iterator->current();
    $next_result_token = $object_iterator->nextResultToken();
    print "NEXT RESULT TOKEN: {$next_result_token}\n";
}

This seems to work for me, so now I can get to the actual problem. Hope this helps someone.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM