I am trying to copy files from SFTP to google cloud storage.
While executing getting exception No module named 'airflow.providers.sftp'
. Much appreciated if some one can give pointers.
Code snippet is:
import os
import airflow
from airflow import DAG
from airflow import models
from airflow.operators import python_operator
from airflow.providers.google.cloud.transfers.sftp_to_gcs import SFTPToGCSOperator
from airflow.utils.dates import days_ago
with models.DAG("test_ssh_to_gcs", start_date=days_ago(1), schedule_interval=None) as dag:
copy_file_from_ssh_to_gcs = SFTPToGCSOperator(
task_id="file-copy-ssh-to-gcs",
source_path="/ ",
destination_bucket='test_sftp_to_gcs',
destination_path="test/test.csv",
gcp_conn_id="google_cloud_default",
sftp_conn_id="sftp_test",
)
copy_file_from_ssh_to_gcs
You get the error because SFTPToGCSOperator uses airflow.providers.sftp.operators.SFTPOperator
under the hood, which present in airflow >= 2.0.0.
The bad news is need to upgrade your Airflow version to use airflow.providers.google.cloud.transfers.sftp_to_gcs.SFTPToGCSOperator
.
If you don't want / cannot upgrade airflow, you can create a DAG chaining two operators:
Operator | Airflow 1.x import |
---|---|
SFTP Operator (download a file to a local using 'operation=get') | from airflow.contrib.operators.sftp_operator import SFTPOperator |
Upload File to Google cloud storage | from airflow.contrib.operators.file_to_gcs import FileToGoogleCloudStorageOperator |
This should do the trick:
LOCALFILE = '/tmp/kk'
with models.DAG("test_ssh_to_gcs", start_date=days_ago(1), schedule_interval=None) as dag:
download_sftp = SFTPOperator(
task_id = 'part1_sftp_download_to_local',
ssh_conn_id="sftp_test",
local_file=LOCALFILE,
remote_file='',
operation='get')
gcp_upload = FileToGoogleCloudStorageOperator(
task_id='part2_upload_to_gcs',
bucket='test_sftp_to_gcs',
src=LOCALFILE,
dst="test/test.csv",
google_cloud_storage_conn_id="google_cloud_default" # configured in Airflow
)
sftp_download >> gcp_upload
With airflow 1.10 you can install backported packages
For your case the following need to be added to your composer cluster:
1- apache-airflow-backport-providers-google
2- apache-airflow-backport-providers-sftp
3- apache-airflow-backport-providers-ssh
Install dependency via pip.
pip install apache-airflow-providers-sftp
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.