[英]Reading multiple csv files in AWS Sagemaker from a location in Amazon S3 Bucket
I have multiple csv files in a location in S3.我在 S3 的一个位置有多个 csv 文件。 The name of those files is in a date format.
这些文件的名称采用日期格式。 Example: 2021_09_30_Output.csv
示例:2021_09_30_Output.csv
I need to understand how I can read all the files in this folder while selecting only the dates that I require.我需要了解如何在只选择我需要的日期的同时读取此文件夹中的所有文件。 An example would be reading only the files from September.
一个例子是只读取九月份的文件。 ie: "2022_09_*.csv" which would read only the files from that month
即:“2022_09_*.csv”将只读取该月的文件
Would appreciate the help.非常感谢您的帮助。 Thanks
谢谢
You can create a function that will return all files from a particular date onwards using the datetime
library based on the naming convention of your files.您可以创建一个 function,它将根据文件的命名约定使用
datetime
时间库返回特定日期之后的所有文件。 The following snippet can get you started:以下代码段可以帮助您入门:
import datetime
s3 = boto3.resource('s3')
BUCKET_NAME = 'name'
september_1 = datetime.datetime(2021, 9, 1)
files = get_files_after(BUCKET_NAME, september_1)
for file in files:
contents = file['Body'].read()
contents = contents.decode("utf-8")
...
def get_files_after(bucket, date):
files = []
for obj in s3.Bucket(bucket).objects.all():
key = obj.key
file_date = key[:-4] # Remove '.csv' from name
file_date = datetime.datetime.strptime(file_date, '%Y_%m_%d')
if file_date > date:
files.append(obj)
return files
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.