I have a parquet file in S3 which has several columns and one of them is json. I have the same format in an RDS database with one column as jsonb.
I would like to copy the parquet file to RDS but how do I cast the file to jsonb data type since Glue doesn't support json column type. When I try to insert the column as string, I get an error. Any ideas on how I can enter a json column to RDS jsonb column?
An error occurred while calling o145.pyWriteDynamicFrame. ERROR: column "json_column" is of type jsonb but expression is of type character varyin
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
DataSource0 = glueContext.create_dynamic_frame.from_options(connection_type = "s3", format = "parquet", connection_options = {"paths": ["s3://folder"], "recurse":True}, transformation_ctx = "DataSource0")
Transform0 = ApplyMapping.apply(frame = DataSource0, mappings = [("id", "long", "id", "long"), ("name", "string", "name", "string"), ("json_column", "string", "json_column", "string")], transformation_ctx = "Transform0")
DataSink0 = glueContext.write_dynamic_frame.from_catalog(frame = Transform0, database = "postgres", table_name = "table", transformation_ctx = "DataSink0")
job.commit()
One path would be to connect to your RDS utilizing psychopg2, iterate over your dataset and load it directly.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.