I have a DynamoDB table with some attributes which are Document Types : list
and map
, to be specific.
I'm trying to export these tables into Redshift. According to the documentation , the Redshift copy
command will fail if you try to copy data from a DynamoDB table which includes either list
or map
attributes:
Only Amazon DynamoDB attributes with scalar STRING and NUMBER data types are supported. The Amazon DynamoDB BINARY and SET data types are not supported. If a COPY command tries to load an attribute with an unsupported data type, the command will fail.
I've also investigated using a Data Pipeline and Hive, but the documentation suggests that Hive doesn't support list
or map
either.
What are the options for moving data out of a DynamoDB - including list
and map
attributes - and into Redshift?
The approach I've taken is to start with the Export DynamoDB Table to S3 template.
The org.apache.hadoop.dynamodb.tools.DynamoDbExport
step which runs in the EMRActivity
seems to output a manifest file specifically meant for use with the RedshiftCopyActivity
: http://docs.aws.amazon.com/redshift/latest/dg/loading-data-files-using-manifest.html
So in addition to the S3BackupLocation
data node which comes with the "Export DynamoDB Table to S3" template, I've added another S3DataNode
which dependsOn
the EMR activity. It specifies a manifestFilePath
, which is the same as the directoryPath
specified in the S3BackupLocation
data node - except with /manifest
added to the end.
That new data node - with the manifestFilePath
- is then used as input for a RedshiftCopyActivity
. To flatten attributes which are maps, the RedshiftCopyActivity
specifies some commandOptions
which provides JSONPath options :
"commandOptions": "JSON 's3://richard.jp.leguen/data-pipeline/misc/jsonpaths.txt'"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.