简体   繁体   English

从 S3 复制到具有不同日期的 GCS 文件

[英]Copy from S3 to GCS files with different date

I have to move files from S3 to GCS.我必须将文件从 S3 移动到 GCS。 The problem i have is on mondays they uploads files from monday but also of saturdays and sundays and this files have different dates.我遇到的问题是在星期一,他们会从星期一上传文件,也会在星期六和星期日上传文件,并且这些文件的日期不同。 For example: stack_20220430.csv, stack_20220501.csv.例如:stack_20220430.csv、stack_20220501.csv。 I need to move this files in the same airflow run, is that posible?我需要在同一个 airflow 运行中移动这些文件,这可能吗? I'm using the S3ToGCSOperator:我正在使用 S3ToGCSOperator:

S3ToGCSOperator(
        task_id="move_files_s3_to_gcs",
        bucket=config["s3_params"]["s3_source_bucket"],
        prefix=config["s3_params"]["s3_source_prefix"],
        delimiter="/",
        dest_gcs=config["gcs_params"]["gcs_destination"],
        aws_conn_id=config["s3_params"]["s3_connector_name"],
    )

Obviously the problem is that prefix takes a fixed value.显然问题在于 prefix 取一个固定值。 I can assign a range for {{ds}}?我可以为 {{ds}} 分配一个范围吗?

The S3ToGCSOperator copy/move all files in the bucket/key you provided. S3ToGCSOperator复制/移动您提供的存储桶/密钥中的所有文件。 It does it by listing all of them and then iterate each file and copy it to GCS.它通过列出所有文件然后迭代每个文件并将其复制到 GCS 来实现。

prefix is templated field so you can use {{ ds }} with it. prefix是模板化字段,因此您可以将{{ ds }}与它一起使用。

You can always inherit from S3ToGCSOperator and customize the behavior of the operator to your specific needs.您始终可以从S3ToGCSOperator继承并根据您的特定需求自定义运算符的行为。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将文件从 S3 SignedURL 复制到 GCS Signed URL - Copy Files from S3 SignedURL to GCS Signed URL 从 S3 复制有限数量的文件? - Copy limited number of files from S3? 将文件从 ec2 复制到 s3 - Copy files from ec2 to s3 如何将特定日期创建的所有文件从一个存储桶复制到 GCS 中的另一个存储桶? - How to copy all the files created on a specific date from one bucket to another in GCS? 在按日期分区的两个 GCS 存储桶之间复制文件 - Copy files between two GCS bucket which is partitioned by date 如何在 Airflow 中使用 Airflow AWS 连接凭证使用 BashOprator 将文件从 AWS s3 存储桶传输到 GCS - How to use Airflow AWS connection credentials in Airflow using BashOprator to transfer files from AWS s3 bucket to GCS 将 json 文件从一个 s3 存储桶复制到另一个 s3 存储桶时,无法识别 Json 文件? - Json file is not recognising when copy json files from one s3 bucket to another s3 bucket? 使用 BigQuery 从 GCS 读取数据失败并显示“未找到”,但日期(文件)存在 - Reading data from GCS with BigQuery fails with "Not Found", but the date (files) exists 更快地复制 S3 文件的方法 - Faster way to Copy S3 files 将 300 万个 S3 文件复制到特定文件夹 - Copy 3 million S3 files to specific folders
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM