简体   繁体   中英

Exporting DynamoDB document types to Redshift

I have a DynamoDB table with some attributes which are Document Types : list and map , to be specific.

I'm trying to export these tables into Redshift. According to the documentation , the Redshift copy command will fail if you try to copy data from a DynamoDB table which includes either list or map attributes:

Only Amazon DynamoDB attributes with scalar STRING and NUMBER data types are supported. The Amazon DynamoDB BINARY and SET data types are not supported. If a COPY command tries to load an attribute with an unsupported data type, the command will fail.

I've also investigated using a Data Pipeline and Hive, but the documentation suggests that Hive doesn't support list or map either.

What are the options for moving data out of a DynamoDB - including list and map attributes - and into Redshift?

The approach I've taken is to start with the Export DynamoDB Table to S3 template.

The org.apache.hadoop.dynamodb.tools.DynamoDbExport step which runs in the EMRActivity seems to output a manifest file specifically meant for use with the RedshiftCopyActivity : http://docs.aws.amazon.com/redshift/latest/dg/loading-data-files-using-manifest.html

So in addition to the S3BackupLocation data node which comes with the "Export DynamoDB Table to S3" template, I've added another S3DataNode which dependsOn the EMR activity. It specifies a manifestFilePath , which is the same as the directoryPath specified in the S3BackupLocation data node - except with /manifest added to the end.

That new data node - with the manifestFilePath - is then used as input for a RedshiftCopyActivity . To flatten attributes which are maps, the RedshiftCopyActivity specifies some commandOptions which provides JSONPath options :

"commandOptions": "JSON 's3://richard.jp.leguen/data-pipeline/misc/jsonpaths.txt'"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM