简体   繁体   中英

Copy from S3 to GCS files with different date

I have to move files from S3 to GCS. The problem i have is on mondays they uploads files from monday but also of saturdays and sundays and this files have different dates. For example: stack_20220430.csv, stack_20220501.csv. I need to move this files in the same airflow run, is that posible? I'm using the S3ToGCSOperator:

S3ToGCSOperator(
        task_id="move_files_s3_to_gcs",
        bucket=config["s3_params"]["s3_source_bucket"],
        prefix=config["s3_params"]["s3_source_prefix"],
        delimiter="/",
        dest_gcs=config["gcs_params"]["gcs_destination"],
        aws_conn_id=config["s3_params"]["s3_connector_name"],
    )

Obviously the problem is that prefix takes a fixed value. I can assign a range for {{ds}}?

The S3ToGCSOperator copy/move all files in the bucket/key you provided. It does it by listing all of them and then iterate each file and copy it to GCS.

prefix is templated field so you can use {{ ds }} with it.

You can always inherit from S3ToGCSOperator and customize the behavior of the operator to your specific needs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM