简体   繁体   English

AWS Glue Python Shell 与 Oracle 连接

[英]AWS Glue Python Shell connection with Oracle

While running AWS Glue Python shell (not using Spark) I want to connect with Oracle.在运行 AWS Glue Python shell(不使用 Spark)时,我想连接 Oracle。 I was successful doing all the stuff (described in the link below) in dev_endpoint or in my virtual machine, but my goal is to have it AWS Glue Python Shell.我在 dev_endpoint 或我的虚拟机中成功地完成了所有工作(在下面的链接中描述),但我的目标是拥有 AWS Glue Python Shell。 Connection with Oracle cx_Oracle problem with AWS Glue Python Shell 与 AWS Glue 的 Oracle cx_Oracle 的连接问题 Python Shell

All the libraries in AWS Glue Python Shell must come as.whl or.egg packages - then they are installed. AWS Glue Python Shell 中的所有库都必须以 .whl 或 .egg 包的形式出现 - 然后安装它们。 But AWS Glue is serverless and I wasn't able to find where they were installed - so that I could set up rpath correctly.但是 AWS Glue 是无服务器的,我无法找到它们的安装位置 - 因此我可以正确设置 rpath。

How to know absolute_path_to_library_dir?如何知道 absolute_path_to_library_dir?

As glue is serverless, there is no /path/to/library/dir .由于胶水是无服务器的,因此没有/path/to/library/dir

python processes in glue need a couple of things to connect to external databases such as your oracle server.粘合中的 python 进程需要一些东西来连接到外部数据库,例如 oracle 服务器。

  1. python libraries must be packaged as an .egg or .whl , uploaded to s3, and the location of these files must be specified when creating a job (the field Python Library Path ). python 库必须打包为.egg.whl ,上传到 s3,创建作业时必须指定这些文件的位置(字段Python Library Path )。 this applies to any library that you author or which you would normally pip install but these are not available in the environment that aws provides for glue processes .这适用于您创作的任何库或您通常pip install的任何库,但这些在 aws 为粘合过程提供的环境中不可用 So you'd need to build an .egg for cx_Oracle locally, upload to s3, and provide the path in Python Library Path when creating the your job.因此,您需要在本地为cx_Oracle构建一个.egg ,上传到 s3,并在创建作业时在Python Library Path中提供路径。 If you have already created the job, you can edit the job and provide the s3-path-to-cs-oracle.egg如果您已经创建了作业,您可以编辑作业并提供s3-path-to-cs-oracle.egg

  2. secrets such as connection credentials must be fetched from an secure external service by the etl script.必须通过 etl 脚本从安全的外部服务中获取连接凭证等机密信息。 One option is to store the oracle connection credentials in glue.一种选择是将 oracle 连接凭证存储在胶水中。 From the aws glue console, go to connections, add a jdbc connection and save your database credentials.从 aws 胶水控制台 go 到连接,添加 jdbc 连接并保存您的数据库凭据。

  3. In your etl script, use boto3.client('glue').get_connection to retrieve the connection details, and using the user uploaded cx_Oracle library, connect to the database.在您的 etl 脚本中,使用boto3.client('glue').get_connection检索连接详细信息,并使用用户上传的cx_Oracle库连接到数据库。 Here's an example snippet that you would need to adapt & include in your etl script这是您需要调整并包含在您的 etl 脚本中的示例片段

snippet:片段:

import boto3
import cx_Oracle as orcl

glue = boto3.client('glue')
resp = glue.get_connection(Name='my-oracle-connection')
props = resp['Connection']['ConnectionProperties']
dsn = props['JDBC_CONNECTION_URL'].split('//')[-1]
user = props['USERNAME']
pw = props['PASSWORD']
db = orcl.connect(user, pw, dsn)
#^ `db` should be a connection to your oracle db

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM