简体   繁体   English

aws firehose 到 s3 存储桶分区名称,如年=YYYY、月=MM、日=MM、小时=HH

[英]aws firehose to s3 bucket partitioning name like year=YYYY, month=MM, day=MM, hour=HH

Currently, AWS Firehose has a default partitioning feature to return the data into S3 with this following partitioned format of folders: YYYY/MM/DD/HH => eg: 2017/10/26/18目前,AWS Firehose 具有默认分区功能,可以使用以下文件夹分区格式将数据返回到 S3:YYYY/MM/DD/HH => eg: 2017/10/26/18

But, I would like to make it like this:但是,我想这样做:

Year=2017/Month=10/Day=26/Hour=18

Is there a way to make the default way to be like above in firehose?有没有办法让 firehose 中的默认方式像上面那样?

I was trying to trigger a SNS topic to invoke a lambda to change the names to be year=yyyy, month=mm, etc, but the problem is that firehose takes some time to create those default partitioned folders.我试图触发一个 SNS 主题来调用 lambda 将名称更改为 year=yyyy、month=mm 等,但问题是 firehose 需要一些时间来创建这些默认分区文件夹。 So I am not too sure how to achieve this without possible conflicts - lambda calls before folder has been created.所以我不太确定如何在没有可能的冲突的情况下实现这一目标 - 在创建文件夹之前调用 lambda。

It would be best if there is an AWS way to handle this, which would be an ideal - which I have not found it yet.如果有一种AWS方法来处理这个问题,那将是最好的,这将是一个理想的——我还没有找到它。

Any suggestion would be appreciative.任何建议将不胜感激。 Thanks!谢谢!

Using Dynamic Partitioning, you can use the following expression in the S3 bucket prefix on the Kinesis Firehose configuration:使用动态分区,您可以在 Kinesis Firehose 配置的S3 bucket prefix中使用以下表达式:

input/kinesis-realtime/year=:{timestamp:yyyy}/month=:{timestamp:MM}/day=!{timestamp:dd}/

使用 s3 前缀选项作为 'year=!{timestamp:YYYY}/month=!{timestamp:MM}/day=!{timestamp:dd}/' 将您的文件夹结构设为 Year=2017/Month=10/Day= 26/小时=18

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将 yyyy-mm-dd hh:MM:SS.mil 形式的时间戳转换为 Athena 中的纪元时间 - How to convert timestamp in the form of yyyy-mm-dd hh:MM:SS.mil to epoch time in Athena 使用 BigQuery 将 CURRENT_TIMESTAMP 转换为 dd/mm/yyyy hh:mm 字符串 - Cast CURRENT_TIMESTAMP to dd/mm/yyyy hh:mm string with BigQuery 无法在 pyspark 中将纪元时间戳转换为“dd-mm-yyyy HH:mm:ss”格式 - Unable to convert epoch timestamp into "dd-mm-yyyy HH:mm:ss" format in pyspark 如何在 Azure 数据工厂中将活动 output 格式化为 YYYY-MM-DD hh:mm:ss - How to format an activity output as YYYY-MM-DD hh:mm:ss in Azure data factory 按年/月(格式 YYYY-MM)细分的平均订阅量(总订阅量/计数订阅)是多少? - How much is the average subscriptions amount (sum amount subscriptions / count subscriptions) breakdown by year/month (format YYYY-MM)? 将 SQL 服务器时区 object `yyyy-mm-dd HH:MM;SS+HH` 转换为有效的日期时间 (UTC) - Converting a SQL Server Timezone object `yyyy-mm-dd HH:MM;SS+HH` to a valid datetime (UTC) AWS s3 试图修复错误 s3.meta.client.head_bucket(Bucket=bucket_name) - AWS s3 trying to fix error s3.meta.client.head_bucket(Bucket=bucket_name) 有没有办法使用 aws s3 ls cli 将 S3 存储桶名称添加到存储桶的递归列表中? - Is there a way to add the S3 bucket name to the recursive list of a bucket using aws s3 ls cli? 使用正则表达式从 aws s3 url 中提取存储桶名称 - Extract bucket name from aws s3 url using regex circle-ci 上的无效存储桶名称(部署到 AWS S3) - Invalid bucket name on circle-ci(Deploy to AWS S3 )
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM