从 AWS Glue 将数据写入 timestreamDb

Question

我正在尝试使用粘合流并将数据写入 AWS TimestreamDB，但我很难配置 JDBC 连接。

我正在执行的步骤如下和文档链接： https://docs.aws.amazon.com/timestream/latest/developerguide/JDBC.configuring.html

我正在将 jar 上传到 S3。 这里有多个 jars，我尝试了每一个。 https://github.com/awslabs/amazon-timestream-driver-jdbc/releases
在粘合作业中，我将 jar lib 路径指向上述 s3 位置
在作业脚本中，我尝试使用带有以下代码的 spark/glue 从时间流中读取，但它不起作用。 有人可以在这里解释我做错了什么

这是我的代码：

url = jdbc:timestream://AccessKeyId=<myAccessKeyId>;SecretAccessKey=<mySecretAccessKey>;SessionToken=<mySessionToken>;Region=us-east-1

source_df = sparkSession.read.format("jdbc").option("url",url).option("dbtable","IoT").option("driver","software.amazon.timestream.jdbc.TimestreamDriver").load()

datasink1 = glueContext.write_dynamic_frame.from_options(frame = applymapping0, connection_type = "jdbc", connection_options = {"url":url,"driver":"software.amazon.timestream.jdbc.TimestreamDriver", database = "CovidTestDb", dbtable = "CovidTestTable"}, transformation_ctx = "datasink1")

Answer 1

到目前为止（2022 年 4 月），不支持使用 timestream 的 jdbc 驱动程序进行写操作（查看代码并看到一堆不支持写的异常）。 不过，可以使用胶水从时间流中读取数据。 以下步骤对我有用：

将timestream-query和timestream-jdbc上传到您可以在胶水脚本中引用的 S3 存储桶
确保脚本的 IAM 角色有权访问时间流数据库和表的读取操作
您不需要在 jdbc url 中使用访问密钥和秘密参数，使用类似jdbc:timestream://Region=<timestream-db-region>应该就足够了
指定driver和fetchsize选项option("driver","software.amazon.timestream.jdbc.TimestreamDriver") option("fetchsize", "100") （根据您的需要调整 fetchsize）

以下是从时间流中读取 dataframe 的完整示例：

val df = sparkSession.read.format("jdbc")
      .option("url", "jdbc:timestream://Region=us-east-1")
      .option("driver","software.amazon.timestream.jdbc.TimestreamDriver")
      // optionally add a query to narrow the data to fetch
      .option("query", "select * from db.tbl where time between ago(15m) and now()")
      .option("fetchsize", "100")
      .load()
df.write.format("console").save()

希望这可以帮助

从 AWS Glue 将数据写入 timestreamDb

问题描述

1 个解决方案

解决方案1
0 2022-04-10 07:40:49

从 AWS Glue 将数据写入 timestreamDb

问题描述

1 个解决方案

解决方案1 0 2022-04-10 07:40:49

解决方案1
0 2022-04-10 07:40:49