简体   繁体   English

如何在 AWS Glue 脚本中将数据写入 S3 时添加不在动态框架中的分区

[英]How to add partitions not in dynamic frame while writing data to S3 in AWS Glue script

While writing the data to S3 using dynamic frame i want to use partitioning columns which are not in dynamic frame.在使用动态框架将数据写入 S3 时,我想使用不在动态框架中的分区列。

For example:例如:

def write_date(outpath,year):
    glue_context.write_dynamic_frame.from_options(
        frame = projectedEvents,
        connection_type = "s3",    
        connection_options = {"path": outpath, "partitionKeys": [year]},
        format = "parquet")

Here year is a parameter which does not present in dynamic frame.这里 year 是动态框架中不存在的参数。

This code is failing with an error: 'partition column "2021" not found in schema'此代码失败并出现错误:'在架构中找不到分区列“2021”'

How can I write data in S3 using my own partitions?如何使用自己的分区在 S3 中写入数据?

Basically I want to write in S3 path as "outpath/2021/<parquet_file>"基本上我想在 S3 路径中写为“outpath/2021/<parquet_file>”

This would work:这会起作用:

projectedEvents = projectedEvents.withColumn('year', lit(2021))

def write_date(frame,outpath,year):
    glue_context.write_dynamic_frame.from_options(
        frame = frame,
        connection_type = "s3",    
        connection_options = {"path": outpath, "partitionKeys":[year]},
        format = "parquet")

write_date(projectedEvents, outpath, 'year')

I would suggest that you take another look into partitioning.我建议你再看看分区。 It has to be a column of the data_frame.它必须是 data_frame 的一列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 AWS Glue 作业不写入 S3 - AWS Glue jobs not writing to S3 AWS Glue 数据目录,带有基于 S3 文件的分区表和分区中的不同架构 - AWS Glue Data Catalog with partitioned table over S3 files and different schemas in partitions aws glue s3 target - 创建20个文件的分区 - aws glue s3 target - creating partitions of 20 files 使用 AWS Glue 将 ACL 权限写入 S3 中的 write_dynamic_frame_from_options - ACL permissions for write_dynamic_frame_from_options in to S3 using AWS Glue 如何在 AWS 中使用 Glue 作业覆盖 s3 数据 - How to override s3 data using Glue job in AWS 从 Glue 运行时在 2 个 AWS 账户之间写入时设置 S3 存储桶权限 - Setting S3 Bucket permissions when writing between 2 AWS Accounts while running from Glue 加载 AWS Glue S3 源数据 - Loading AWS Glue S3 Source Data AWS Glue,在加载到框架之前进行数据过滤,命名s3对象 - AWS Glue, data filtering before loading into a frame, naming s3 objects AWS Glue 作业在写入 S3 时被拒绝访问 - AWS Glue Job getting Access Denied when writing to S3 为什么在 AWS Glue pyspark 中使用 UDF 添加派生列后,将 DataFrame 写入 S3(或将动态帧写入 Redshift)会出错? - Why is write DataFrame to S3 (or write dynamic frame to Redshift) giving error after adding derived column using UDF in AWS Glue pyspark?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM