简体   繁体   English

从 Amazon S3 存储桶中的某个位置读取 AWS Sagemaker 中的多个 csv 文件

[英]Reading multiple csv files in AWS Sagemaker from a location in Amazon S3 Bucket

I have multiple csv files in a location in S3.我在 S3 的一个位置有多个 csv 文件。 The name of those files is in a date format.这些文件的名称采用日期格式。 Example: 2021_09_30_Output.csv示例:2021_09_30_Output.csv

I need to understand how I can read all the files in this folder while selecting only the dates that I require.我需要了解如何在只选择我需要的日期的同时读取此文件夹中的所有文件。 An example would be reading only the files from September.一个例子是只读取九月份的文件。 ie: "2022_09_*.csv" which would read only the files from that month即:“2022_09_*.csv”将只读取该月的文件

Would appreciate the help.非常感谢您的帮助。 Thanks谢谢

You can create a function that will return all files from a particular date onwards using the datetime library based on the naming convention of your files.您可以创建一个 function,它将根据文件的命名约定使用datetime时间库返回特定日期之后的所有文件。 The following snippet can get you started:以下代码段可以帮助您入门:

import datetime

s3 = boto3.resource('s3')
BUCKET_NAME = 'name'
september_1 = datetime.datetime(2021, 9, 1)
files = get_files_after(BUCKET_NAME, september_1)
for file in files:
    contents = file['Body'].read()
    contents = contents.decode("utf-8")
    ...


def get_files_after(bucket, date):
    files = []
    for obj in s3.Bucket(bucket).objects.all():
        key = obj.key
        file_date = key[:-4] # Remove '.csv' from name
        file_date = datetime.datetime.strptime(file_date, '%Y_%m_%d')
        if file_date > date:
            files.append(obj)
    return files

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将 tensorflow model 从本地机器转移到 AWS SageMaker 时读取 S3 存储桶时出现问题 - Having issues reading S3 bucket when transitioning a tensorflow model from local machine to AWS SageMaker 从 AWS lambda function 中的 s3 存储桶中读取 .mdb 或 .accdb 文件并使用 python 将其转换为 excel 或 csv - Reading .mdb or .accdb file from s3 bucket in AWS lambda function and converting it into excel or csv using python 将 FTP 个文件 (csv) 自动化到 Amazon S3 存储桶 - Automate FTP of files (csv) to Amazon S3 bucket 如何从 amazon s3 存储桶中删除文件? - how to delete files from amazon s3 bucket? 将文件从 Mac (iCloud) 保存到 S3 存储桶 (AWS) 的脚本 - Script to save files from Mac (iCloud) to S3 bucket (AWS) 从 AWS S3 存储桶在 Express 中提供 static 个文件 - Serve static files in Express from AWS S3 bucket 将 zip 个文件直接从网站上传到 AWS S3 存储桶? - Upload zip files directly to AWS S3 bucket from website? 从 s3 存储桶中读取许多小文件 - Reading many small files from an s3 bucket 将 pandas 数据帧作为压缩的 CSV 直接写入 Amazon s3 存储桶? - Write pandas dataframe as compressed CSV directly to Amazon s3 bucket? 将文件从一个帐户中的 AWS S3 存储桶复制到 terraform/python 中另一个帐户中的存储桶 - copy files from AWS S3 bucket in one account to bucket in another account in terraform/python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM