[英]How to add partitions not in dynamic frame while writing data to S3 in AWS Glue script
While writing the data to S3 using dynamic frame i want to use partitioning columns which are not in dynamic frame.在使用动态框架将数据写入 S3 时,我想使用不在动态框架中的分区列。
For example:例如:
def write_date(outpath,year):
glue_context.write_dynamic_frame.from_options(
frame = projectedEvents,
connection_type = "s3",
connection_options = {"path": outpath, "partitionKeys": [year]},
format = "parquet")
Here year is a parameter which does not present in dynamic frame.这里 year 是动态框架中不存在的参数。
This code is failing with an error: 'partition column "2021" not found in schema'此代码失败并出现错误:'在架构中找不到分区列“2021”'
How can I write data in S3 using my own partitions?如何使用自己的分区在 S3 中写入数据?
Basically I want to write in S3 path as "outpath/2021/<parquet_file>"基本上我想在 S3 路径中写为“outpath/2021/<parquet_file>”
This would work:这会起作用:
projectedEvents = projectedEvents.withColumn('year', lit(2021))
def write_date(frame,outpath,year):
glue_context.write_dynamic_frame.from_options(
frame = frame,
connection_type = "s3",
connection_options = {"path": outpath, "partitionKeys":[year]},
format = "parquet")
write_date(projectedEvents, outpath, 'year')
I would suggest that you take another look into partitioning.我建议你再看看分区。 It has to be a column of the data_frame.
它必须是 data_frame 的一列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.