Tag[aws-glue-data-catalog] Recent Newest Questions

Glue crawler creating multiple tables

I have 2 S3 buckets with the following format: s3://bucket/{lob_name_1}/{table_name}/{current_date}/table_name.csv s3://bucket/{lob_name_2} ...

AWS Glue Job : An error occurred while calling getCatalogSource. None.get

I was using Password/Username in my aws glue conenctions and now I switched to Secret Manager. Now I get this error when I run my etl job : An er ...

How to define the AWS Athena s3 output location using terraform when using aws_glue_catalog_database and aws_glue_catalog_table resources

Summary The terraform config below creates aws_glue_catalog_database and aws_glue_catalog_table resources, but does not define an s3 bucket output lo ...

AWS Glue enableUpdateCatalog not creating new partitions after successful job run

I am having a problem, where i have set enableUpdateCatalog=True and also updateBehaviour=LOG to update my glue table which has 1 partition key. After ...

Partitioning by date on Glue: 1 date column vs 3 columns (year/month/day)?

I was wondering why in the Glue/Athena/Redshift Spectrum documentation and workshops, all the partitioning examples on dates use 3 columns (year/month ...

Unable to use BLANKSASNULL Data conversion parameter in write_dynamic_frame.from_catalog while moving data to Redshift table

Previously, To move data to Redshift table we used "Copy" Command which has the functionality of Data Conversion parameters like BLANKSASNULL and EMPT ...

Glue Catalog w/ Delta Tables Connected to Databricks SQL Engine

I am trying to query delta tables from my AWS Glue Catalog on Databricks SQL Engine. They are stored in Delta Lake format. I have glue crawlers automa ...

42703 ERROR: column "my_nested_column" does not exist

I run a Glue Crawler over a nested JSON data-source on S3 and I tried to query nested fields as per documentation via Redshift Spectrum: But as per ...

How can we update existing partition data in aws glue table without running crawler?

When we are updating data in existing partition by using manual upload to s3 bucket then the data is showing in existing partition in athena glue tabl ...

Why is Kinesis or Crawler creating partitions in my data?

Context: I'm using kinesis to stream data from my lambda into an S3 bucket according to a glue schema. Then I run a crawler on my S3 bucket to catalog ...

How to prevent AWS Glue crawler from reading wrong data types?

I am running AWS Glue crawler on a CSV file. This CSV file has a string column which has alpahanumeric values. The crawler is setting the data type fo ...

Cache error when querying table stored in Glue Data Catalog through Zeppelin

I have an error with the way Zeppelin cache tables. We update the data in the Glue Data Catalog in real time, so when we want to query a partition tha ...

AWS Glue Crawler glob Exclude Pattern functionality

We need to ignore a few paths while crawling through a specific path. Below are the details: Full path : "s3://dev-bronze/api/sp/reports/xyz/brand= ...

AWS Glue Crawlers add partitions within exclude pattern conditions

I ran into the following situation: let's say I have the following s3 structure s3://my_bucket/path_to_crawl/partition=A/some_file.parquet s3:// ...

AWS glue job (Pyspark) to AWS glue data catalog

We know that, the procedure of writing from pyspark script (aws glue job) to AWS data catalog is to write in s3 bucket (eg.csv) use a crawler and sche ...

AWS Athena partition projection

Seemingly cannot get Athena to partition projection to work. When I add partitions the "old fashioned" way and then run a MSCK REPAIR TABLE testparts ...

Glue Job Succeeded but no data inserted into the target table (Aurora Mysql)

I created a glue job using the visual tab like below. First I connected to a mysql table as data source which is already in my data catalog. Then in t ...

copying files from one folder inside AWS bucket to another if the folder name matches using either AWS glue or lambda

I have 2 AWS buckets staging and destination both have the same number of subfolders let's assume 3. So staging as 3 named a, b, c and destination hav ...

Cannot write Lake Formation governed table data from Glue ETL Job

I am building a POC with Lake Formation where I read a queue of train movement information and persist the individual events into a governed table usi ...

Environment for print Capture on AWS GLUE

Where can I see, for example, the prints that are written in my AWS GLUE script? Like a terminal screen that shows me the messages that were stored in ...