[英]Execute Stored Procedure in Glue ETL
How can we execute a SQL statement (like... 'call store_proc();') in Redshift via PySpark Glue ETL job by utilizing a catalog connection?我们如何使用目录连接通过 PySpark Glue ETL 作业在 Redshift 中执行 SQL 语句(例如...'call store_proc();')? I want to pass on the Redshift connection details (host, user, password) from Glue Catalog Connection.
我想从 Glue Catalog Connection 传递 Redshift 连接详细信息(主机、用户、密码)。
I understand the 'write_dynamic_frame' option but I am not sure how to only execute a SQL statement against the Redshift server.我了解“write_dynamic_frame”选项,但我不确定如何只对 Redshift 服务器执行 SQL 语句。
glueContext.write_dynamic_frame.from_jdbc_conf (frame=data_frame, catalog_connection="Redshift_Catalog_Conn", connection_options = {"preactions":"call stored_prod();","dbtable":"public.table1","database": "admin"}, redshift_tmp_dir="s3://glue_etl/")
As I understand, you want to call a Stored Procedure in RedShift from your Glue ETL Job.据我了解,您想从 Glue ETL 作业中调用 RedShift 中的存储过程。 One way to do this is as follows: A simpler way to execute a stored procedure in Redshift is as follows.
一种方法如下: 在 Redshift 中执行存储过程的一种更简单的方法如下。
post_query="begin; CALL sp_procedure1(); end;"
datasink = glueContext.write_dynamic_frame.from_jdbc_conf(frame = mydf, \
catalog_connection = "redshift_connection", \
connection_options = {"dbtable": "my_table", "database": "dev","postactions":post_query}, \
redshift_tmp_dir = 's3://tempb/temp/' transformation_ctx = "datasink")
The other more elaborate solution will be run SQL queries in application code.另一个更精细的解决方案是在应用程序代码中运行 SQL 查询。
my_conn_options = { "url": "jdbc:redshift://host:port/redshift-database-name", "dbtable": "redshift-table-name", "user": "username", "password": "password", "redshiftTmpDir": args["TempDir"], "aws_iam_role": "arn:aws:iam::account id:role/role-name" } df = glueContext.create_dynamic_frame_from_options("redshift", my_conn_options)
spark_df=df.toDF() spark_df.createOrReplaceTempView("CUSTOM_TABLE_NAME") spark.sql('call store_proc();')
Your stored procedure in RedShift should have return values which can be written out to variables. RedShift 中的存储过程应该具有可以写出到变量的返回值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.