繁体   English   中英

将 aws-glue 中的 json 保存到 postgres,jsonb 类型列中

[英]Saving json from aws-glue into postgres, jsonb type column

我正在使用 Aws-Glue 脚本将数据从 S3 移动到 Postgres RDS。 Postgres db 中的一列(图像)具有jsonb类型。

是否可以将字符串转换为 json 格式以启用胶水脚本保存到jsonb列类型?

这是我在 aws-glue 中使用的脚本

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "test_database", table_name = "s3_source", transformation_ctx = "datasource0")
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("id", "string", "id", "string"), ("title", "string", "title", "string"),  ("images", "string", "images", "string")], transformation_ctx = "applymapping1")
selectfields2 = SelectFields.apply(frame = applymapping1, paths = ["images", "title", "id"], transformation_ctx = "selectfields2")
resolvechoice3 = ResolveChoice.apply(frame = selectfields2, choice = "MATCH_CATALOG", database = "test_database", table_name ="rds_target", transformation_ctx = "resolvechoice3")
resolvechoice4 = ResolveChoice.apply(frame = resolvechoice3, choice = "make_cols", transformation_ctx = "resolvechoice4")
datasink5 = glueContext.write_dynamic_frame.from_catalog(frame = resolvechoice4, database = "test_database", table_name = "rds_target", transformation_ctx = "datasink5")
job.commit()

感谢https://stackoverflow.com/a/65821468/2797747设法解决它

替换了我旧的write_dynamic_frame调用

datasink4 = glueContext.write_dynamic_frame.from_jdbc_conf(frame = my_dyn_frame, catalog_connection = "mydb", connection_options = {"dbtable": "mytable", "database": "mydb"}, transformation_ctx = "datasink4")

和:

df = my_dyn_frame.toDF()

url = 'jdbc:postgresql://<path>:5432/<database>'

properties = {'user':'*****',
              'password':'*****',
              'driver': "org.postgresql.Driver",
              'stringtype':"unspecified"}

df.write.jdbc(url, table="mytable", mode="append", properties=properties)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM