简体   繁体   English

AWS Python Shell-如何使用Glue目录连接

[英]AWS Python Shell - How to use Glue Catalog Connections

I have a JDBC connection defined in Glue and I am able to use it successfully in a Glue Spark job. 我在Glue中定义了一个JDBC连接,并且能够在Glue Spark作业中成功使用它。 How would I use that same connection in a Glue Python Shell job? 我如何在Glue Python Shell作业中使用相同的连接? I can't find any templates how to do this though I've seen references that it is possible. 尽管我看到了可能的引用,但找不到任何模板该如何做。

An alternative would be how to define a JDBC connection in the Python Shell where I would need to include an external library? 另一种选择是如何在Python Shell中定义JDBC连接,我需要在其中包含一个外部库? pyodbc I've read is not available due to dependencies. 由于依赖性,我读过的pyodbc不可用。

When you attach a JDBC connection to a Glue Python Shell job, it can only be used by Glue to launch ENIs in the specified subnet with the security groups. 当您将JDBC连接附加到Glue Python Shell作业时,Glue只能使用它来启动具有安全组的指定子网中的ENI。 The jdbc url, username, password holds no value for the Python Shell job. jdbc url,用户名和密码对于Python Shell作业没有任何值。

Thus, you'll have to provide an external package, like pymssql ( http://www.pymssql.org/en/stable/ ), and initialize the connection from the script itself. 因此,您必须提供一个外部软件包,例如pymssql( http://www.pymssql.org/en/stable/ ),并通过脚本本身初始化连接。

You may refer the documentation for Providing Your Own Python Library: https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html#create-python-egg-library 您可以参考提供自己的Python库的文档: https : //docs.aws.amazon.com/glue/latest/dg/add-job-python.html#create-python-egg-library

Here is code. 这是代码。 Make sure same connection is added in your python shell job cloudformation template. 确保在python shell作业cloudformation模板中添加了相同的连接。 We are using pg8000 library. 我们正在使用pg8000库。

def get_connection(self, conn_name):

        client = boto3.client('glue', region_name=self.region_id)
        response = client.get_connection(Name=conn_name)
        print response

        connection_properties = response['Connection']['ConnectionProperties']
        URL = connection_properties['JDBC_CONNECTION_URL']
        url_list = URL.split("/")

        host = "{}".format(url_list[-2][:-5])
        port = url_list[-2][-4:]
        database = "{}".format(url_list[-1])
        user = "{}".format(connection_properties['USERNAME'])
        pwd = "{}".format(connection_properties['PASSWORD'])

        # print "user:{}".format(user)
        # print "pwd:{}".format(pwd)
        # print "host:{}".format(host)
        # print "port:{}".format(port)
        # print "database:{}".format(database)

        rs_conn = dbapi.connect(database=database, host=host, port=5439, \
                                user=user, password=pwd, ssl=True)
        cur = rs_conn.cursor()
        cur.execute("set statement_timeout = 1200000")
        rs_conn.commit()
        cur.close()
        return rs_conn

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM