[英]Copy from S3 to GCS files with different date
I have to move files from S3 to GCS.我必须将文件从 S3 移动到 GCS。 The problem i have is on mondays they uploads files from monday but also of saturdays and sundays and this files have different dates.
我遇到的问题是在星期一,他们会从星期一上传文件,也会在星期六和星期日上传文件,并且这些文件的日期不同。 For example: stack_20220430.csv, stack_20220501.csv.
例如:stack_20220430.csv、stack_20220501.csv。 I need to move this files in the same airflow run, is that posible?
我需要在同一个 airflow 运行中移动这些文件,这可能吗? I'm using the S3ToGCSOperator:
我正在使用 S3ToGCSOperator:
S3ToGCSOperator(
task_id="move_files_s3_to_gcs",
bucket=config["s3_params"]["s3_source_bucket"],
prefix=config["s3_params"]["s3_source_prefix"],
delimiter="/",
dest_gcs=config["gcs_params"]["gcs_destination"],
aws_conn_id=config["s3_params"]["s3_connector_name"],
)
Obviously the problem is that prefix takes a fixed value.显然问题在于 prefix 取一个固定值。 I can assign a range for {{ds}}?
我可以为 {{ds}} 分配一个范围吗?
The S3ToGCSOperator
copy/move all files in the bucket/key you provided. S3ToGCSOperator
复制/移动您提供的存储桶/密钥中的所有文件。 It does it by listing all of them and then iterate each file and copy it to GCS.它通过列出所有文件然后迭代每个文件并将其复制到 GCS 来实现。
prefix
is templated field so you can use {{ ds }}
with it. prefix
是模板化字段,因此您可以将{{ ds }}
与它一起使用。
You can always inherit from S3ToGCSOperator
and customize the behavior of the operator to your specific needs.您始终可以从
S3ToGCSOperator
继承并根据您的特定需求自定义运算符的行为。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.