简体   繁体   English

检查路径是否是文件 Amazon S3

[英]Check if path is file Amazon S3

I have functionality for listing AWS S3 directories with Scala and I would like to check if the listed path is a file or a directory.我有使用 Scala 列出 AWS S3 目录的功能,我想检查列出的路径是文件还是目录。 How can I implement this functionality (isFile method) using amazon-sdk-s3?如何使用 amazon-sdk-s3 实现此功能(isFile 方法)?

Here is how it looks like:这是它的样子:

  def listContents(): Seq[T] =
    val paths = s3Client.list(inputPath)
    for {
      path <- paths if isFile(new Path(path)).getOrElse(false)
      res <- transform(path.toString)
    } yield res

def isFile(path: String) = ??? //implementation I need

Amazon S3 does not have the concept of a 'Directory'. Amazon S3 没有“目录”的概念。

Instead, the full path of an object is stored in its Key (filename).相反,对象的完整路径存储在其密钥(文件名)中。

For example, an object can be stored in Amazon S3 with a Key of: invoices/2020-09/inv22.txt例如,一个对象可以存储在 Amazon S3 中,其 Key 为: invoices/2020-09/inv22.txt

This object can be created even if the invoices and 2020-09 directories do not exist.即使invoices2020-09目录不存在,也可以创建此对象。 When viewed through the Amazon S3 console, it will appear as though those directories were automatically created, but if the object is deleted, those directories will disappear (because they never existed).通过 Amazon S3 控制台查看时,这些目录似乎是自动创建的,但如果删除对象,这些目录将消失(因为它们从未存在过)。

If a user clicks the "Create Folder" button in the Amazon S3 management console, a zero-length object is created with the same name as the folder .如果用户单击 Amazon S3 管理控制台中的“创建文件夹”按钮,则会创建一个与文件夹同名零长度对象 This 'forces' the folder to appear even if there are no objects 'inside' the folder.即使文件夹“内部”没有对象,这也会“强制”显示文件夹。 However, it is not actually a folder.但是,它实际上不是文件夹。

Therefore, it is not possible to "check if the listed path is a file or a directory" because directories do not exist.因此,无法“检查列出的路径是文件还是目录”,因为目录不存在。 Instead, I recommend that you assume everything is a 'file' unless it is zero-length .相反,我建议您假设所有内容都是“文件”,除非它是 zero-length

S3 doesn't have the notion of folders commonly found in file systems but instead has a flat structure, more details can be found here . S3 没有文件系统中常见的文件夹概念,而是具有扁平结构,可以在此处找到更多详细信息。

Generally speaking elements that don't end in "/" are to be treated as objects but, while the AWS Web console doesn't allow you to upload files that end in "/", this is possible via SDK/API:一般来说,不以“/”结尾的元素将被视为对象,但是,虽然 AWS Web 控制台不允许您上传以“/”结尾的文件,但这可以通过 SDK/API 来实现:

The Amazon S3 console treats all objects that have a forward slash ("/") character as the last (trailing) character in the key name as a folder, for example examplekeyname/ . Amazon S3 控制台将所有具有正斜杠 ("/") 字符的对象视为键名中的最后一个(尾随)字符作为文件夹,例如examplekeyname/ You can't upload an object that has a key name with a trailing "/" character using the Amazon S3 console.您无法使用 Amazon S3 控制台上传键名带有尾随“/”字符的对象。 However, you can upload objects that are named with a trailing "/" with the Amazon S3 API by using the AWS CLI, AWS SDKs, or REST API.但是,您可以使用 AWS CLI、AWS 开发工具包或 REST API 使用 Amazon S3 API 上传以尾随“/”命名的对象。

The other answer suggests to assume everything is a file unless it's zero length, this is a good suggestion but will break apart in case some of your files are expected to be empty (but you still need to process them), below an example of the response metadata returned by an empty txt file I just uploaded and tried to retrieve:另一个答案建议假设所有文件都是一个文件,除非它的长度为零,这是一个很好的建议,但如果您的某些文件预计为空(但您仍然需要处理它们),则会分解,下面是我刚刚上传并尝试检索的空 txt 文件返回的响应元数据:

{
  "HTTPHeaders": {
    "accept-ranges": "bytes",
    "content-length": "0",
    "content-type": "text/plain",
    "date": "Mon, 14 Sep 2020 10:14:29 GMT",
    "etag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
    "last-modified": "Mon, 14 Sep 2020 10:13:52 GMT",
    "x-amz-server-side-encryption": "AES256"
  },
  "HTTPStatusCode": 200,
  "RetryAttempts": 0
}

Depending on how you traverse/list your objects, chances are that the response already includes objects only, but in case there's any ambiguity I'd suggest you to still attempt to retrieve the object and be prepared to handle an exception.根据您遍历/列出对象的方式,响应可能已经只包含对象,但如果有任何歧义,我建议您仍然尝试检索对象并准备好处理异常。 If you try to retrive a key that is a folder, Amazon S3 will return an HTTP status code 404 ("no such key") error - Docs here .如果您尝试检索作为文件夹的密钥,Amazon S3 将返回 HTTP 状态代码 404(“没有这样的密钥”)错误 - Docs here

It is possible to check if a folder does exist.可以检查文件夹是否存在。 What I have done is using listObjectV2 for it.我所做的是使用 listObjectV2 。

A bit of tricks here这里有一些技巧

  1. Appending '/' to the folder name and then pass it to listObjectV2.将“/”附加到文件夹名称,然后将其传递给 listObjectV2。
  2. Setting max item return is 1设置最大项目返回为 1

When you got the list object backs you can easily check if the object (folder) exists or not.当您获得列表对象后,您可以轻松检查对象(文件夹)是否存在。

I will update my code a bit when I have time.当我有时间时,我会稍微更新我的代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM