简体   繁体   English

从Amazon S3中提取文件和元数据的有效方法?

[英]Efficient way to extract files and meta data from Amazon S3?

Is there a more efficient way to list files from a bucket in Amazon S3 and also extract the meta data for each of those files? 是否有更有效的方法从Amazon S3中的存储桶列出文件,还提取每个文件的元数据? I'm using the AWS PHP SDK. 我正在使用AWS PHP SDK。

if ($paths = $s3->get_object_list('my-bucket')) {
    foreach($paths AS $path) {
        $meta = $s3->get_object_metadata('my-bucket', $path);
        echo $path . ' was modified on ' . $meta['LastModified'] . '<br />';
    }
}

At the moment I need to run get_object_list() to list all the files and then get_object_metadata() for each file to get its meta data. 目前,我需要运行get_object_list()列出所有文件,然后运行每个文件的get_object_metadata()以获取其元数据。

If I have 100 files in my bucket, it makes 101 calls to get this data. 如果我的存储桶中有100个文件,则会进行101次调用以获取此数据。 It would be good if it's possible to do it in 1 call. 如果可以在1次通话中完成它将会很好。

Eg: 例如:

if ($paths = $s3->get_object_list('my-bucket')) {
    foreach($paths AS $path) {
        echo $path['FileName'] . ' was modified on ' . $path['LastModified'] . '<br />';
    }
}

I know this is a bit old, but I encountered this problem and to solve it I extended the Aws sdk to use the batch functionality for this type of problem. 我知道这有点旧,但我遇到了这个问题并解决了它我扩展了Aws sdk以使用批处理功能来解决这类问题。 It makes a lot quicker to retrieve custom meta data for lots of files. 检索大量文件的自定义元数据会更快。 This is my code: 这是我的代码:

    /**
     * Name: Steves_Amazon_S3
     * 
     * Extends the AmazonS3 class in order to create a function to 
     * more efficiently retrieve a list of
     * files and their custom metadata using the CFBatchRequest function.
     * 
     * 
     */
    class Steves_Amazon_S3 extends AmazonS3 {

        public function get_object_metadata_batch($bucket, $filenames, $opt = null) {
            $batch = new CFBatchRequest();

            foreach ($filenames as $filename) {

                $this->batch($batch)->get_object_headers($bucket, $filename); // Get content-type
            }

            $response = $this->batch($batch)->send();

            // Fail if any requests were unsuccessful
            if (!$response->areOK()) {
                return false;
            }
            foreach ($response as $file) {
                $temp = array();
                $temp['name'] = (string) basename($file->header['_info']['url']);
                $temp['etag'] = (string) basename($file->header['etag']);
                $temp['size'] = $this->util->size_readable((integer) basename($file->header['content-length']));
                $temp['size_raw'] = basename($file->header['content-length']);
                $temp['last_modified'] = (string) date("jS M Y H:i:s", strtotime($file->header['last-modified']));
                $temp['last_modified_raw'] = strtotime($file->header['last-modified']);
                @$temp['creator_id'] = (string) $file->header['x-amz-meta-creator'];
                @$temp['client_view'] = (string) $file->header['x-amz-meta-client-view'];
                @$temp['user_view'] = (string) $file->header['x-amz-meta-user-view'];

                $result[] = $temp;
            }

            return $result;
        }
    }

You need to know that list_objects function has limit. 您需要知道list_objects函数有限制。 It doesn't allows to load more than 1000 objects, even if max-keys option will be set to some large number. 即使将max-keys选项设置为某个较大的数字,它也不允许加载超过1000个对象。

To fix this you need to load data several times: 要解决此问题,您需要多次加载数据:

private function _getBucketObjects($prefix = '', $booOneLevelOny = false)
{
    $objects = array();
    $lastKey = null;
    do {
        $args = array();
        if (isset($lastKey)) {
            $args['marker'] = $lastKey;
        }

        if (strlen($prefix)) {
            $args['prefix'] = $prefix;
        }

        if($booOneLevelOny) {
            $args['delimiter'] = '/';
        }

        $res = $this->_client->list_objects($this->_bucket, $args);
        if (!$res->isOK()) {
            return null;
        }

        foreach ($res->body->Contents as $object) {
            $objects[] = $object;
            $lastKey = (string)$object->Key;
        }
        $isTruncated = (string)$res->body->IsTruncated;
        unset($res);
    } while ($isTruncated == 'true');

    return $objects;
}

As result - you'll have a full list of the objects. 结果 - 您将拥有完整的对象列表。


What if you have some custom headers? 如果您有一些自定义标题怎么办? They will be not returned via list_objects function. 它们不会通过list_objects函数返回。 In this case this will help: 在这种情况下,这将有助于:

foreach (array_chunk($arrObjects, 1000) as $object_set) {
    $batch = new CFBatchRequest();
    foreach ($object_set as $object) {
        if(!$this->isFolder((string)$object->Key)) {
            $this->_client->batch($batch)->get_object_headers($this->_bucket, $this->preparePath((string)$object->Key));
        }
    }

    $response = $this->_client->batch($batch)->send();

    if ($response->areOK()) {
        foreach ($response as $arrHeaderInfo) {
            $arrHeaders[] = $arrHeaderInfo->header;
        }
    }
    unset($batch, $response);
}

I ended up using the list_objects function which pulled out the LastModified meta I required. 我最终使用了list_objects函数,它取出了我需要的LastModified元素。

All in one call :) 一气呵成:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从S3上的前几MB提取视频元数据 - Extract Video Meta Data from First Few MB on S3 如何将数据从Amazon SQS流传输到Amazon S3中的文件 - How to stream data from Amazon SQS to files in Amazon S3 有自动方式将数据从Amazon EFS复制到S3吗? - Any automate way to copy data from Amazon EFS to S3? 有没有办法使用 lambda 将数据从 S3 导出到 Amazon Aurora 无服务器? - Is there a way to export data from S3 to Amazon Aurora serverless with lambda? 在 Amazon S3 中删除文件的最快方法 - Fastest way to delete files in Amazon S3 使用Zend Framework将大文件从Amazon S3复制到本地文件系统的最有效方法是什么 - What is the most efficient way to copy a big file from Amazon S3 to the local filesystem using Zend Framework 将具有不同元数据的多个对象放入Amazon S3失败 - Putting multiple objects in Amazon S3 with different meta data fails 将元数据添加到Amazon S3中的文件夹 - Adding meta-data to a folder in amazon S3 亚马逊S3文件的大数据压缩 - Big data zip on amazon S3 files 有没有办法直接从C#中的Amazon S3制表符分隔文件批量插入Amazon Aurora RDS? - Is there a way to bulk insert into Amazon Aurora RDS directly from Amazon S3 tab delimited files in C#?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM