Writing data to timestreamDb from AWS Glue

Question

I'm trying to use glue streaming and write data to AWS TimestreamDB but I'm having a hard time in configuring the JDBC connection.

Steps I'm following are below and the documentation link: https://docs.aws.amazon.com/timestream/latest/developerguide/JDBC.configuring.html

I'm uploading the jar to S3. There are multiple jars here and I tried with each one of it. https://github.com/awslabs/amazon-timestream-driver-jdbc/releases
In the glue job I'm pointing the jar lib path to the above s3 location
In the job script I'm trying to read from timestream using both spark/ glue with the below code but its not working. Can someone explain what I'm doing wrong here

This is my code:

url = jdbc:timestream://AccessKeyId=<myAccessKeyId>;SecretAccessKey=<mySecretAccessKey>;SessionToken=<mySessionToken>;Region=us-east-1

source_df = sparkSession.read.format("jdbc").option("url",url).option("dbtable","IoT").option("driver","software.amazon.timestream.jdbc.TimestreamDriver").load()

datasink1 = glueContext.write_dynamic_frame.from_options(frame = applymapping0, connection_type = "jdbc", connection_options = {"url":url,"driver":"software.amazon.timestream.jdbc.TimestreamDriver", database = "CovidTestDb", dbtable = "CovidTestTable"}, transformation_ctx = "datasink1")

Answer 1

To this date (April 2022) there is not support for write operations using timestream's jdbc driver (reviewed the code and saw a bunch of no write support exceptions). It is possible to read data from timestream using glue though. Following steps worked for me:

Upload timestream-query and timestream-jdbc to an S3 bucket that you can reference in your glue script
Ensure that the IAM role for the script has access to read operations to the timestream database and table
You don't need to use the access key and secret parameters in the jdbc url, using something like jdbc:timestream://Region=<timestream-db-region> should be enough
Specify the driver and fetchsize options option("driver","software.amazon.timestream.jdbc.TimestreamDriver") option("fetchsize", "100") (tweak the fetchsize according to your needs)

Following is a complete example of reading a dataframe from timestream:

val df = sparkSession.read.format("jdbc")
      .option("url", "jdbc:timestream://Region=us-east-1")
      .option("driver","software.amazon.timestream.jdbc.TimestreamDriver")
      // optionally add a query to narrow the data to fetch
      .option("query", "select * from db.tbl where time between ago(15m) and now()")
      .option("fetchsize", "100")
      .load()
df.write.format("console").save()

Hope this helps

Writing data to timestreamDb from AWS Glue

Question

1 answers

solution1
0 2022-04-10 07:40:49

Writing data to timestreamDb from AWS Glue

Question

1 answers

solution1 0 2022-04-10 07:40:49

solution1
0 2022-04-10 07:40:49