简体   繁体   English

成功运行作业后,AWS Glue enableUpdateCatalog 未创建新分区

[英]AWS Glue enableUpdateCatalog not creating new partitions after successful job run

I am having a problem, where i have set enableUpdateCatalog=True and also updateBehaviour=LOG to update my glue table which has 1 partition key.我遇到了一个问题,我设置enableUpdateCatalog=TrueupdateBehaviour=LOG来更新我的具有 1 个分区键的胶水表。 After the job, runs there are no new partitions added on my glue catalog table, but data in S3 is separated by the partition key i have used, how do i get the job to automatically partition my glue catalog table?作业完成后,我的胶水目录表上没有添加新分区,但 S3 中的数据由我使用的分区键分隔,我如何让作业自动对胶水目录表进行分区? Currently i have to manually run boto3 create_partition to create partitions on my glue catalog table.目前我必须手动运行 boto3 create_partition在我的胶水目录表上创建分区。 I want my job to automatically be able to create partitions as it discovers in S3 path separated by partition Keys Code:我希望我的工作能够自动创建分区,因为它在由分区键代码分隔的 S3 路径中发现:

additionalOptions = {
    "enableUpdateCatalog": True, 
    "updateBehavior": "LOG"}
additionalOptions["partitionKeys"] = ["partition_key0", "partition_key1"]

my_df = glueContext.write_dynamic_frame_from_catalog(frame=last_transform, database=<dst_db_name>,
    table_name=<dst_tbl_name>, transformation_ctx="DataSink1",
    additional_options=additionalOptions)
job.commit()

PS: I am currently using PARQUET format PS:我目前使用的是PARQUET格式

Am i missing any Rights that has to be added to my job so that it can create partitions from the job itself?我是否错过了必须添加到我的工作中以便它可以从工作本身创建分区的任何权利?

I got it to work by adding useGlueParquetWriter: 'true' to the CATALOG table properties.我通过将 useGlueParquetWriter: 'true' 添加到 CATALOG 表属性来使其工作。 And also I have added而且我还添加了

format_options = {
'useGlueParquetWriter': True
}

in the write_dynamic_frame.from_catalog calls.write_dynamic_frame.from_catalog调用中。 These steps got it to start working:)这些步骤让它开始工作:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM