繁体   English   中英

如何使用存储连接从谷歌电子表格读取 Airflow DAG

[英]How to make an Airflow DAG read from a google spread sheet using a stored connection

我正在尝试构建 Airflow DAG,这些 DAG 从某些 Google 电子表格中读取数据(或向其中写入数据)。 在 Airflow 的连接中,我保存了一个类型为“Google Cloud Platform”的连接,其中包括 project_id、范围和“Keyfile JSON”,一个包含“type”、“project_id”、“private_key_id”、“private_key”、 client_email","client_id", "auth_uri","token_uri","auth_provider_x509_cert_url","client_x509_cert_url"

我可以使用连接到 Google Spread Sheet

cred_dict = ... same as what I saved in Keyfile JSON ...
creds = ServiceAccountCredentials.from_json_keyfile_dict(cred_dict,scope)
client = gspread.authorize(creds)
sheet = client.open(myfile).worksheet(mysheet) # works!

但我宁愿不在代码中显式编写密钥,而是从 Airflow 连接导入它。

我想知道是否有类似的解决方案

from airflow.hooks.some_hook import get_the_keyfile
conn_id = my_saved_gcp_connection
cred_dict = get_the_keyfile(gcp_conn_id=conn_id)
creds = ServiceAccountCredentials.from_json_keyfile_dict(cred_dict,scope)
client = gspread.authorize(creds)
sheet = client.open(myfile).worksheet(mysheet)

我看到 GCP 连接有几个钩子https://airflow.apache.org/howto/connection/gcp.html但我的知识很少让我无法理解使用哪个和哪个 function(如果有的话)从中提取密钥文件保存的连接。

非常欢迎任何建议:)

下面是我用来使用存储连接从 Airflow 连接到 gspread 表的代码。

import json
import gspread
from oauth2client.service_account import ServiceAccountCredentials
from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook

def get_cred_dict(conn_id='my_google_connection'):
    gcp_hook = GoogleCloudBaseHook(gcp_conn_id=conn_id)
    return json.loads(gcp_hook._get_field('keyfile_dict'))

def get_client(conn_id='my_google_connection'):
    cred_dict = get_cred_dict(conn_id)
    creds = ServiceAccountCredentials.from_json_keyfile_dict(cred_dict, scope)
    client = gspread.authorize(creds)
    return client

def get_sheet(doc_name, sheet_name):
    client = get_client()
    sheet = client.open(doc_name).worksheet(sheet_name)
    return sheet

对于 Airflow 2.5.1(2023 年),以下代码也适用。

from airflow.providers.google.common.hooks.base_google import GoogleBaseHook
import gspread

# Create a hook object
# When using the google_cloud_default we can use 
# hook = GoogleBaseHook()
# Or for a deligate use: GoogleBaseHook(delegate_to='foo@bar.com')
hook = GoogleBaseHook(gcp_conn_id='my_google_cloud_conn_id') 

# Get the credentials
credentials = hook.get_credentials()

# Optional, set the delegate email if needed later. 
# You need a domain wide delegate service account to use this.
credentials = credentials.with_subject('foo@bar.com')

# Use the credentials to authenticate the gspread client
gc = gspread.Client(auth=credentials)

# Create Spreadsheet
gc.create('Yabadabadoooooooo') # Optional use folder_id=
gc.list_spreadsheet_files()

周长

资源:

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM