从 Amazon S3 存储桶中的某个位置读取 AWS Sagemaker 中的多个 csv 文件

Question

I have multiple csv files in a location in S3.我在 S3 的一个位置有多个 csv 文件。 The name of those files is in a date format.这些文件的名称采用日期格式。 Example: 2021_09_30_Output.csv示例：2021_09_30_Output.csv

I need to understand how I can read all the files in this folder while selecting only the dates that I require.我需要了解如何在只选择我需要的日期的同时读取此文件夹中的所有文件。 An example would be reading only the files from September.一个例子是只读取九月份的文件。 ie: "2022_09_*.csv" which would read only the files from that month即：“2022_09_*.csv”将只读取该月的文件

Would appreciate the help.非常感谢您的帮助。 Thanks谢谢

Answer 1

You can create a function that will return all files from a particular date onwards using the datetime library based on the naming convention of your files.您可以创建一个 function，它将根据文件的命名约定使用datetime时间库返回特定日期之后的所有文件。 The following snippet can get you started:以下代码段可以帮助您入门：

import datetime

s3 = boto3.resource('s3')
BUCKET_NAME = 'name'
september_1 = datetime.datetime(2021, 9, 1)
files = get_files_after(BUCKET_NAME, september_1)
for file in files:
    contents = file['Body'].read()
    contents = contents.decode("utf-8")
    ...


def get_files_after(bucket, date):
    files = []
    for obj in s3.Bucket(bucket).objects.all():
        key = obj.key
        file_date = key[:-4] # Remove '.csv' from name
        file_date = datetime.datetime.strptime(file_date, '%Y_%m_%d')
        if file_date > date:
            files.append(obj)
    return files

从 Amazon S3 存储桶中的某个位置读取 AWS Sagemaker 中的多个 csv 文件

问题描述

1 个解决方案

解决方案1
0 2022-04-18 23:11:13

从 Amazon S3 存储桶中的某个位置读取 AWS Sagemaker 中的多个 csv 文件

问题描述

1 个解决方案

解决方案1 0 2022-04-18 23:11:13

解决方案1
0 2022-04-18 23:11:13