简体   繁体   中英

Glue crawler creating multiple tables

I have 2 S3 buckets with the following format:

  1. s3://bucket/{lob_name_1}/{table_name}/{current_date}/table_name.csv
  2. s3://bucket/{lob_name_2}/{table_name}/{current_date}/table_name.csv

We have the same table name belonging to 2 different LOB's. We have an AWS Glue crawler each for a single LOB. When the crawler runs for the first LOB, the tables are created as expected. When the crawler runs for the second LOB, the tables that are in common between LOB 1 and LOB 2 are recreated with a different name. Is there a way in which we can prevent the additional table from being created when the crawler for the second LOB runs?

There is parameter that you should be using that will fix your issue

Create a single schema for each S3 path : true

Configuration options

Schema updates in the data store: Ignore the change and don't update the table in the data catalog.

Inherit schema from table: Update all new and existing partitions with metadata from the table.

Object deletion in the data store: Ignore the change and don't update the table in the data catalog.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM