简体   繁体   中英

AWS Glue Crawler for JSONB column in PostgreSQL RDS

I've created a crawler that looks at a PostgreSQL 9.6 RDS table with a JSONB column but the crawler identifies the column type as "string". When I then try to create a job that loads data from a JSON file on S3 into the RDS table I get an error.

How can I map a JSON file source to a JSONB target column?

It's not quite a direct copy, but an approach that has worked for me is to define the column on the target table as TEXT. After the Glue job populates the field, I then convert it to JSONB. For example:

alter table postgres_table
 alter column column_with_json set data type jsonb using column_with_json::jsonb;

Note the use of the cast for the existing text data. Without that, the alter column would fail.

Crawler will identify JSONB column type as "string" but you can try to use Unbox Class in Glue to convert this column to json

let's check the following table in PostgreSQL

create table persons (id integer, person_data jsonb, creation_date timestamp )

There is an example of one record from person table

ID = 1
PERSON_DATA = {
               "firstName": "Sergii",
               "age": 99,
               "email":"Test@test.com"
               }
CREATION_DATE = 2021-04-15 00:18:06

The following code need to be added in Glue

# 1. create dynamic frame from catalog 
df_persons = glueContext.create_dynamic_frame.from_catalog(database = "testdb", table_name = "persons", transformation_ctx = "df_persons ")
# 2.in path you need to add your jsonb column name that need to be converted to json
df_persons_json = Unbox.apply(frame = df_persons , path = "person_data", format="json")
# 3. converting from dynamic frame to data frame 
datf_persons_json = df_persons_json.toDF()

# 4. after that you can process this column as a json datatype or create dataframe with all necessary columns , each json data element can be added as a separate column in dataframe : 
final_df_person = datf_persons_json.select("id","person_data.age","person_data.firstName","creation_date")

You can also check the following link:

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-transforms-Unbox.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM