简体   繁体   中英

AWS Glue Job - Load parquet file from S3 to RDS jsonb column

I have a parquet file in S3 which has several columns and one of them is json. I have the same format in an RDS database with one column as jsonb.

I would like to copy the parquet file to RDS but how do I cast the file to jsonb data type since Glue doesn't support json column type. When I try to insert the column as string, I get an error. Any ideas on how I can enter a json column to RDS jsonb column?

 An error occurred while calling o145.pyWriteDynamicFrame. ERROR: column "json_column" is of type jsonb but expression is of type character varyin
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

DataSource0 = glueContext.create_dynamic_frame.from_options(connection_type = "s3", format = "parquet", connection_options = {"paths": ["s3://folder"], "recurse":True}, transformation_ctx = "DataSource0")
Transform0 = ApplyMapping.apply(frame = DataSource0, mappings = [("id", "long", "id", "long"), ("name", "string", "name", "string"), ("json_column", "string", "json_column", "string")], transformation_ctx = "Transform0")

DataSink0 = glueContext.write_dynamic_frame.from_catalog(frame = Transform0, database = "postgres", table_name = "table", transformation_ctx = "DataSink0")
job.commit()

One path would be to connect to your RDS utilizing psychopg2, iterate over your dataset and load it directly.

How to insert JSONB into Postgresql with Python?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM