简体   繁体   English

在 AWS Glue ETL 脚本中使用自定义连接器

[英]Using custom connector in AWS Glue ETL script

I am working on an AWS Glue ETL script using the dynamic frame glue abstraction and writing code in python.我正在使用dynamic frame胶抽象和 python 中的代码编写 AWS Glue ETL 脚本。

I created a JDBC connection resource named sap-lpr-connection in the glue data catalog and would like to use it to retrieve the connection options from the code.我在胶水数据目录中创建了一个名为sap-lpr-connection的 JDBC 连接资源,并想用它从代码中检索连接选项。

As per this link (and other sources), I should be using a "custom.jdbc" connection_type to access the connection resource I created.根据链接(和其他来源),我应该使用"custom.jdbc" connection_type来访问我创建的连接资源。

This is what my code looks like:这是我的代码的样子:

from pyspark.context import SparkContext
from awsglue.context import GlueContext

# DATABASE
database = 'sap_lpr'
table = 'bsim'

# GLUE CONTEXT
glue_context = GlueContext(SparkContext.getOrCreate())

# CONNECTION OPTIONS
connection_options = {
    "connectionName": f"{database.replace('_', '-')}-connection",
    "dbTable": table
}

# READ DATA
dyf = glue_context.create_dynamic_frame.from_options(
    connection_type="custom.jdbc",
    connection_options=connection_options
)

But when I run the code I get this error:但是当我运行代码时,我得到了这个错误:

An error occurred while calling o81.getSource.调用 o81.getSource 时出错。 Glue ETL Marketplace: Can not retrieve required field CONNECTOR_TYPE. Glue ETL 市场:无法检索必填字段 CONNECTOR_TYPE。

I know an alternative would be to specify a "jdbc" connection_type and pass the various connection options such as jdbc URL, username and password, but I prefer to retrieve that information from the glue connection resource I created on purpose for this.我知道另一种方法是指定"jdbc" connection_type并传递各种连接选项,例如 jdbc、URL、用户名和密码,但我更喜欢从为此目的创建的胶水连接资源中检索该信息。

Also, I would really like to stick to the glue_context API as opposed to the standard spark API.另外,我真的很想坚持使用glue_context API,而不是标准的 spark API。

Any idea what I might be doing wrong?知道我可能做错了什么吗?

OK, it turns out that I misunderstood the type of connector I was using.好的,事实证明我误解了我使用的连接器类型。

I created a connection resource in the AWS Glue Data Catalog using a "standard" connector, the JDBC one and this is not considered a custom connector type in the connection_type field, but rather a standard JDBC connection that you specify like so for example: connection_type='sqlserver' .我使用“标准”连接器在 AWS Glue 数据目录中创建了一个连接资源,即 JDBC 连接器,这不被视为connection_type字段中的自定义连接器类型,而是您指定的标准 JDBC 连接,例如: connection_type='sqlserver'

So if you create a connection using one of the standard connectors, such as JDBC, you have to use the .extract_from_conf() method to extract the configuration from the connection resource:因此,如果您使用标准连接器之一创建连接,例如 JDBC,则必须使用.extract_from_conf()方法从连接资源中提取配置:

configuration = glue_context.extract_jdbc_conf(
    connection_name,
    catalog_id=None
)

connection_options = {
    "url": configuration["url"],
    "user": configuration["user"]
    "password": configuration["password"]
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM