简体   繁体   English

AWS Glue 作业 - 将镶木地板文件从 S3 加载到 RDS jsonb 列

[英]AWS Glue Job - Load parquet file from S3 to RDS jsonb column

I have a parquet file in S3 which has several columns and one of them is json.我在 S3 中有一个镶木地板文件,它有几列,其中之一是 json。 I have the same format in an RDS database with one column as jsonb.我在 RDS 数据库中具有与 jsonb 相同的一列格式。

I would like to copy the parquet file to RDS but how do I cast the file to jsonb data type since Glue doesn't support json column type.我想将镶木地板文件复制到 RDS,但我如何将文件转换为 jsonb 数据类型,因为 Glue 不支持 json 列类型。 When I try to insert the column as string, I get an error.当我尝试将列作为字符串插入时,出现错误。 Any ideas on how I can enter a json column to RDS jsonb column?关于如何将 json 列输入到 RDS jsonb 列的任何想法?

 An error occurred while calling o145.pyWriteDynamicFrame. ERROR: column "json_column" is of type jsonb but expression is of type character varyin
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

DataSource0 = glueContext.create_dynamic_frame.from_options(connection_type = "s3", format = "parquet", connection_options = {"paths": ["s3://folder"], "recurse":True}, transformation_ctx = "DataSource0")
Transform0 = ApplyMapping.apply(frame = DataSource0, mappings = [("id", "long", "id", "long"), ("name", "string", "name", "string"), ("json_column", "string", "json_column", "string")], transformation_ctx = "Transform0")

DataSink0 = glueContext.write_dynamic_frame.from_catalog(frame = Transform0, database = "postgres", table_name = "table", transformation_ctx = "DataSink0")
job.commit()

One path would be to connect to your RDS utilizing psychopg2, iterate over your dataset and load it directly.一种方法是使用 Psychopg2 连接到您的 RDS,遍历您的数据集并直接加载它。

How to insert JSONB into Postgresql with Python? 如何使用 Python 将 JSONB 插入 Postgresql?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在AWS Glus pyspark作业中从s3加载JSON - Load JSON from s3 inside aws glue pyspark job 使用 aws boto 将文件从 csv 转换为 S3 上的镶木地板 - Convert file from csv to parquet on S3 with aws boto 如何使用 Glue 作业将 JSON 从 s3 转换为 CSV 文件并将其保存在同一个 s3 存储桶中 - How to convert JSON to CSV file from s3 and save it in same s3 bucket using Glue job 将 aws-glue 中的 json 保存到 postgres,jsonb 类型列中 - Saving json from aws-glue into postgres, jsonb type column AWS Glue Studio: - 作业运行但将空文件输出到 S3 - AWS Glue Studio: - job runs but outputs empty files to S3 使用 Glue 将数据从 RDS 移动到 S3 - Moving data from RDS to S3 using Glue 在本地运行 Pyspark 以访问 S3 中的镶木地板文件错误:“无法从链中的任何提供商加载 AWS 凭证” - Running Pyspark locally to access parquet file in S3 Error: “Unable to load AWS credentials from any provider in the chain” 在Python Pandas中使用read_parquet从AWS S3读取Parquet文件时出现分段错误 - Segmentation Fault while reading parquet file from AWS S3 using read_parquet in Python Pandas AWS Glue 从 VPC 中的 RDS 数据库读取 - AWS Glue read from RDS database that's in VPC AWS Glue denest postgres jsonb列 - AWS Glue denest postgres jsonb column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM