如何在 AWS Glue 脚本中将数据写入 S3 时添加不在动态框架中的分区

Question

While writing the data to S3 using dynamic frame i want to use partitioning columns which are not in dynamic frame.在使用动态框架将数据写入 S3 时，我想使用不在动态框架中的分区列。

For example:例如：

def write_date(outpath,year):
    glue_context.write_dynamic_frame.from_options(
        frame = projectedEvents,
        connection_type = "s3",    
        connection_options = {"path": outpath, "partitionKeys": [year]},
        format = "parquet")

Here year is a parameter which does not present in dynamic frame.这里 year 是动态框架中不存在的参数。

This code is failing with an error: 'partition column "2021" not found in schema'此代码失败并出现错误：'在架构中找不到分区列“2021”'

How can I write data in S3 using my own partitions?如何使用自己的分区在 S3 中写入数据？

Basically I want to write in S3 path as "outpath/2021/<parquet_file>"基本上我想在 S3 路径中写为“outpath/2021/<parquet_file>”

Answer 1

This would work:这会起作用：

projectedEvents = projectedEvents.withColumn('year', lit(2021))

def write_date(frame,outpath,year):
    glue_context.write_dynamic_frame.from_options(
        frame = frame,
        connection_type = "s3",    
        connection_options = {"path": outpath, "partitionKeys":[year]},
        format = "parquet")

write_date(projectedEvents, outpath, 'year')

I would suggest that you take another look into partitioning.我建议你再看看分区。 It has to be a column of the data_frame.它必须是 data_frame 的一列。

如何在 AWS Glue 脚本中将数据写入 S3 时添加不在动态框架中的分区

问题描述

1 个解决方案

解决方案1
0 2021-04-07 16:55:05

如何在 AWS Glue 脚本中将数据写入 S3 时添加不在动态框架中的分区

问题描述

1 个解决方案

解决方案1 0 2021-04-07 16:55:05

解决方案1
0 2021-04-07 16:55:05