[英]How to make an Airflow DAG read from a google spread sheet using a stored connection
我正在尝试构建 Airflow DAG,这些 DAG 从某些 Google 电子表格中读取数据(或向其中写入数据)。 在 Airflow 的连接中,我保存了一个类型为“Google Cloud Platform”的连接,其中包括 project_id、范围和“Keyfile JSON”,一个包含“type”、“project_id”、“private_key_id”、“private_key”、 client_email","client_id", "auth_uri","token_uri","auth_provider_x509_cert_url","client_x509_cert_url"
我可以使用连接到 Google Spread Sheet
cred_dict = ... same as what I saved in Keyfile JSON ...
creds = ServiceAccountCredentials.from_json_keyfile_dict(cred_dict,scope)
client = gspread.authorize(creds)
sheet = client.open(myfile).worksheet(mysheet) # works!
但我宁愿不在代码中显式编写密钥,而是从 Airflow 连接导入它。
我想知道是否有类似的解决方案
from airflow.hooks.some_hook import get_the_keyfile
conn_id = my_saved_gcp_connection
cred_dict = get_the_keyfile(gcp_conn_id=conn_id)
creds = ServiceAccountCredentials.from_json_keyfile_dict(cred_dict,scope)
client = gspread.authorize(creds)
sheet = client.open(myfile).worksheet(mysheet)
我看到 GCP 连接有几个钩子https://airflow.apache.org/howto/connection/gcp.html但我的知识很少让我无法理解使用哪个和哪个 function(如果有的话)从中提取密钥文件保存的连接。
非常欢迎任何建议:)
下面是我用来使用存储连接从 Airflow 连接到 gspread 表的代码。
import json
import gspread
from oauth2client.service_account import ServiceAccountCredentials
from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook
def get_cred_dict(conn_id='my_google_connection'):
gcp_hook = GoogleCloudBaseHook(gcp_conn_id=conn_id)
return json.loads(gcp_hook._get_field('keyfile_dict'))
def get_client(conn_id='my_google_connection'):
cred_dict = get_cred_dict(conn_id)
creds = ServiceAccountCredentials.from_json_keyfile_dict(cred_dict, scope)
client = gspread.authorize(creds)
return client
def get_sheet(doc_name, sheet_name):
client = get_client()
sheet = client.open(doc_name).worksheet(sheet_name)
return sheet
对于 Airflow 2.5.1(2023 年),以下代码也适用。
from airflow.providers.google.common.hooks.base_google import GoogleBaseHook
import gspread
# Create a hook object
# When using the google_cloud_default we can use
# hook = GoogleBaseHook()
# Or for a deligate use: GoogleBaseHook(delegate_to='foo@bar.com')
hook = GoogleBaseHook(gcp_conn_id='my_google_cloud_conn_id')
# Get the credentials
credentials = hook.get_credentials()
# Optional, set the delegate email if needed later.
# You need a domain wide delegate service account to use this.
credentials = credentials.with_subject('foo@bar.com')
# Use the credentials to authenticate the gspread client
gc = gspread.Client(auth=credentials)
# Create Spreadsheet
gc.create('Yabadabadoooooooo') # Optional use folder_id=
gc.list_spreadsheet_files()
周长
资源:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.