简体繁体中英

Glue crawler creating multiple tables

原文 2022-10-05 12:28:25 0 1 amazon-web-services/ aws-glue/ aws-glue-data-catalog

I have 2 S3 buckets with the following format:

s3://bucket/{lob_name_1}/{table_name}/{current_date}/table_name.csv

s3://bucket/{lob_name_2}/{table_name}/{current_date}/table_name.csv

We have the same table name belonging to 2 different LOB's. We have an AWS Glue crawler each for a single LOB. When the crawler runs for the first LOB, the tables are created as expected. When the crawler runs for the second LOB, the tables that are in common between LOB 1 and LOB 2 are recreated with a different name. Is there a way in which we can prevent the additional table from being created when the crawler for the second LOB runs?

1 answers

There is parameter that you should be using that will fix your issue

Create a single schema for each S3 path : true

Configuration options

Schema updates in the data store: Ignore the change and don't update the table in the data catalog.

Inherit schema from table: Update all new and existing partitions with metadata from the table.

Object deletion in the data store: Ignore the change and don't update the table in the data catalog.

AWS Glue Crawler issue

AWS Glue - multiple RDS tables in one job

AWS Athena Return Zero Records from Tables Created by GLUE Crawler input csv from S3

update schedule of a glue crawler on aws

Step function hanging on glue crawler step

Glue Crawler: The number of unique events received is 0 for the target

AWS Glue Crawler cannot parse large files (classification UNKNOWN)

Glue crawler could not classify JSON data of size > 20 mb

AWS Glue Crawler - Crawl new folders only - Internal Service Exception

Having trouble setting up multiple tables in AWS glue from a single bucket

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question AWS Glue Crawler issue AWS Glue - multiple RDS tables in one job AWS Athena Return Zero Records from Tables Created by GLUE Crawler input csv from S3 update schedule of a glue crawler on aws Step function hanging on glue crawler step Glue Crawler: The number of unique events received is 0 for the target AWS Glue Crawler cannot parse large files (classification UNKNOWN) Glue crawler could not classify JSON data of size > 20 mb AWS Glue Crawler - Crawl new folders only - Internal Service Exception Having trouble setting up multiple tables in AWS glue from a single bucket

Related Tags

Glue crawler creating multiple tables

Question

1 answers

solution1 0 2022-10-07 04:25:26

solution1
0 2022-10-07 04:25:26