s3fs 和 Python os.walk

Question

我试图找出一种从 S3 存储桶读取图像的方法。 现在，我的设置是使用 s3fs 挂载存储桶，然后使用带有os.walk的 python 脚本遍历每个单独的图像并使用 numpy 对它们进行一些操作。

然而，输出

os.walk("mnt/")

没什么！ 该命令在安装的驱动器中看不到任何文件，但如果我手动找到图像

plt.imread("mnt/path/to/file")

我收到图像。 我在我的智慧尽头试图弄清楚这一点。 有任何想法吗？

Answer 1

来自 S3 的挂载存储桶的行为不像文件系统中的普通文件/目录，因此像os.walk这样的语句不会像您期望的那样工作。 最好的办法是使用库从 Python 本身中搜索 S3 存储桶并与之交互。

我建议研究 boto，它有很多与 AWS 交互的工具。 另请查看 AWS Python SDK。

Boto： https : //github.com/boto/boto适用于 Python 的 AWS 开发工具包： https : //aws.amazon.com/sdk-for-python/

Answer 2

你可以做：

s3 = s3fs.S3FileSystem()
for dirpath, dirname, filename in s3.walk(<your bucket name>):
# care about the how many directories your bucket have
    for filename in filenames:
        file_path = f'{dirpath}{filepath}'
            with s3.open(file_path, 'rb') as f:
                # do your numpy stuff with the "f" object

上面的代码会遍历整个bucket，只有在bucket根目录下才有文件，如果之前有目录，加if语句，例如：

if dirpath.split('/') == <depth of the directory with the files>:

Answer 3

这里有一些错误。 我认为第 5 行的{dirpath}{filepath}应该是{dirpath}/{filename} ，而filename应该是第 2 行的filenames ，但除此之外很有帮助！

Answer 4

作为替代方案，我仅使用 boto3 实现了类似于 os.walk() 的内容。

请参阅我在相关问题中的回答。

s3fs 和 Python os.walk

问题描述

4 个解决方案

解决方案1
1 2016-05-15 08:53:20

解决方案2
0 2020-06-25 01:04:44

解决方案3
0 2020-12-29 15:01:12

解决方案4
0 2021-01-28 22:28:05

s3fs 和 Python os.walk

问题描述

4 个解决方案

解决方案1 1 2016-05-15 08:53:20

解决方案2 0 2020-06-25 01:04:44

解决方案3 0 2020-12-29 15:01:12

解决方案4 0 2021-01-28 22:28:05

解决方案1
1 2016-05-15 08:53:20

解决方案2
0 2020-06-25 01:04:44

解决方案3
0 2020-12-29 15:01:12

解决方案4
0 2021-01-28 22:28:05