简体   繁体   中英

AWS Glue : How to make sure glue crawler always picks up the latest file from S3

I have an ETL pipeline which outputs a.csv files into S3 every 15 minutes. How can I configure a glue crawler such that it picks up only the latest file instead of using all the files.

Using incremental crawls:

For an Amazon Simple Storage Service (Amazon S3) data source, incremental crawls only crawl folders that were added since the last crawler run. Without this option, the crawler crawls the entire dataset. ... To perform an incremental crawl, you can set the Crawl new folders only option in the AWS Glue console or set the RecrawlPolicy property in the CreateCrawler request in the API.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM