I have 2 S3 buckets with the following format: s3://bucket/{lob_name_1}/{table_name}/{current_date}/table_name.csv s3://bucket/{lob_name_2} ...
I have 2 S3 buckets with the following format: s3://bucket/{lob_name_1}/{table_name}/{current_date}/table_name.csv s3://bucket/{lob_name_2} ...
I was using Password/Username in my aws glue conenctions and now I switched to Secret Manager. Now I get this error when I run my etl job : An er ...
Summary The terraform config below creates aws_glue_catalog_database and aws_glue_catalog_table resources, but does not define an s3 bucket output lo ...
I am having a problem, where i have set enableUpdateCatalog=True and also updateBehaviour=LOG to update my glue table which has 1 partition key. After ...
I was wondering why in the Glue/Athena/Redshift Spectrum documentation and workshops, all the partitioning examples on dates use 3 columns (year/month ...
Previously, To move data to Redshift table we used "Copy" Command which has the functionality of Data Conversion parameters like BLANKSASNULL and EMPT ...
I am trying to query delta tables from my AWS Glue Catalog on Databricks SQL Engine. They are stored in Delta Lake format. I have glue crawlers automa ...
I run a Glue Crawler over a nested JSON data-source on S3 and I tried to query nested fields as per documentation via Redshift Spectrum: But as per ...
When we are updating data in existing partition by using manual upload to s3 bucket then the data is showing in existing partition in athena glue tabl ...
Context: I'm using kinesis to stream data from my lambda into an S3 bucket according to a glue schema. Then I run a crawler on my S3 bucket to catalog ...
I am running AWS Glue crawler on a CSV file. This CSV file has a string column which has alpahanumeric values. The crawler is setting the data type fo ...
I have an error with the way Zeppelin cache tables. We update the data in the Glue Data Catalog in real time, so when we want to query a partition tha ...
We need to ignore a few paths while crawling through a specific path. Below are the details: Full path : "s3://dev-bronze/api/sp/reports/xyz/brand= ...
I ran into the following situation: let's say I have the following s3 structure s3://my_bucket/path_to_crawl/partition=A/some_file.parquet s3:// ...
We know that, the procedure of writing from pyspark script (aws glue job) to AWS data catalog is to write in s3 bucket (eg.csv) use a crawler and sche ...
Seemingly cannot get Athena to partition projection to work. When I add partitions the "old fashioned" way and then run a MSCK REPAIR TABLE testparts ...
I created a glue job using the visual tab like below. First I connected to a mysql table as data source which is already in my data catalog. Then in t ...
I have 2 AWS buckets staging and destination both have the same number of subfolders let's assume 3. So staging as 3 named a, b, c and destination hav ...
I am building a POC with Lake Formation where I read a queue of train movement information and persist the individual events into a governed table usi ...
Where can I see, for example, the prints that are written in my AWS GLUE script? Like a terminal screen that shows me the messages that were stored in ...