简体繁体中英

AWS Glue : How to make sure glue crawler always picks up the latest file from S3

原文 2022-10-05 12:15:19 9 1 amazon-web-services/ amazon-s3/ aws-glue

I have an ETL pipeline which outputs a.csv files into S3 every 15 minutes. How can I configure a glue crawler such that it picks up only the latest file instead of using all the files.

1 answers

Using incremental crawls:

For an Amazon Simple Storage Service (Amazon S3) data source, incremental crawls only crawl folders that were added since the last crawler run. Without this option, the crawler crawls the entire dataset. ... To perform an incremental crawl, you can set the Crawl new folders only option in the AWS Glue console or set the RecrawlPolicy property in the CreateCrawler request in the API.

AWS Glue Crawler: want separate table for folder in s3

AWS Athena Returning Zero Records from Tables Created from GLUE Crawler input csv from S3

AWS Athena Return Zero Records from Tables Created by GLUE Crawler input csv from S3

AWS Glue job to unzip a file from S3 and write it back to S3

AWS Glue Crawler issue

Loading parquet file from S3 to AWS RDS taking extremely long time using AWS Glue ETL

Load data from S3 into Aurora Serverless using AWS Glue

How to convert JSON to CSV file from s3 and save it in same s3 bucket using Glue job

AWS Glue reading glue catalog table VS reading files from s3

update schedule of a glue crawler on aws

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question AWS Glue Crawler: want separate table for folder in s3 AWS Athena Returning Zero Records from Tables Created from GLUE Crawler input csv from S3 AWS Athena Return Zero Records from Tables Created by GLUE Crawler input csv from S3 AWS Glue job to unzip a file from S3 and write it back to S3 AWS Glue Crawler issue Loading parquet file from S3 to AWS RDS taking extremely long time using AWS Glue ETL Load data from S3 into Aurora Serverless using AWS Glue How to convert JSON to CSV file from s3 and save it in same s3 bucket using Glue job AWS Glue reading glue catalog table VS reading files from s3 update schedule of a glue crawler on aws

Related Tags

AWS Glue : How to make sure glue crawler always picks up the latest file from S3

Question

1 answers

solution1 0 2022-10-05 14:29:23

solution1
0 2022-10-05 14:29:23