AWS Glue enableUpdateCatalog not creating new partitions after successful job run

Question

I am having a problem, where i have set enableUpdateCatalog=True and also updateBehaviour=LOG to update my glue table which has 1 partition key. After the job, runs there are no new partitions added on my glue catalog table, but data in S3 is separated by the partition key i have used, how do i get the job to automatically partition my glue catalog table? Currently i have to manually run boto3 create_partition to create partitions on my glue catalog table. I want my job to automatically be able to create partitions as it discovers in S3 path separated by partition Keys Code:

additionalOptions = {
    "enableUpdateCatalog": True, 
    "updateBehavior": "LOG"}
additionalOptions["partitionKeys"] = ["partition_key0", "partition_key1"]

my_df = glueContext.write_dynamic_frame_from_catalog(frame=last_transform, database=<dst_db_name>,
    table_name=<dst_tbl_name>, transformation_ctx="DataSink1",
    additional_options=additionalOptions)
job.commit()

PS: I am currently using PARQUET format

Am i missing any Rights that has to be added to my job so that it can create partitions from the job itself?

Answer 1

I got it to work by adding useGlueParquetWriter: 'true' to the CATALOG table properties. And also I have added

format_options = {
'useGlueParquetWriter': True
}

in the write_dynamic_frame.from_catalog calls. These steps got it to start working:)

AWS Glue enableUpdateCatalog not creating new partitions after successful job run

Question

1 answers

solution1
0 2022-09-12 18:27:15

AWS Glue enableUpdateCatalog not creating new partitions after successful job run

Question

1 answers

solution1 0 2022-09-12 18:27:15

solution1
0 2022-09-12 18:27:15