将 json.gz 的命令从 s3 复制到 redshift

Question

We have stored results of DDB to s3 output as json.gz files.我们已将 DDB 的结果作为 json.gz 文件存储到 s3 output 中。 We want to transfer those to redshift using the copy command.我们想使用 copy 命令将它们转移到 redshift。 We don't want to do a direct DDB to Redshift because copying directly usually involves a scan operation.我们不想直接对 Redshift 进行 DDB，因为直接复制通常涉及扫描操作。 This causes read capacity to be utilized which we want to avoid since these tables are pretty large.这会导致使用我们想要避免的读取容量，因为这些表非常大。 I could not find much on how to use a copy command on a json.gz file.我找不到太多关于如何在 json.gz 文件上使用复制命令的信息。 Please let me know if someone can find a way to do this.请让我知道是否有人可以找到一种方法来做到这一点。 I tried treating it like a json as suggested in one of the comments我尝试按照其中一条评论的建议将其视为 json
copy itemtable from 's3://bucket/path/file.json.gz' iam_role '<role>' json 'auto ignorecase'
it did not work.那没起效。 My file when unzipped is in this format:我的文件解压后是这种格式：
{"Item":{"field":{"S":"value"},"field":{"N":"value"}}}'\n'{"Item":{"field":{"S":"value"},"field":{"N":"value"}}}'\n'
Exact Error is确切的错误是
error is Load into table 'itemtable' failed. Check 'stl_load_errors' system table for details

Answer 1

Doing a few things worked for me.做一些事情对我有用。

from '<input s3 location>'
iam_role '<iam role>'
json '<jsonpath file location>' gzip ACCEPTINVCHARS ' ' TRUNCATECOLUMNS TRIMBLANKS
region '<aws region>'

here jsonpath.json file is in following format这里 jsonpath.json 文件格式如下

{
  "jsonpaths": [
    "$['Item']['Field1']['S']",
    "$['Item']['Field2']['N']",
    .
    .
    .
  ]
}

And the table contains the same columns as fields specified in jsonpath.该表包含与 jsonpath 中指定的字段相同的列。

As suggested in John Rotenstein's comment, the copy command takes care of gzip and we don't need to worry about it.正如 John Rotenstein 的评论中所建议的，复制命令负责 gzip，我们不需要担心它。

将 json.gz 的命令从 s3 复制到 redshift

问题描述

1 个解决方案

解决方案1
0 2021-06-24 06:19:37

将 json.gz 的命令从 s3 复制到 redshift

问题描述

1 个解决方案

解决方案1 0 2021-06-24 06:19:37

解决方案1
0 2021-06-24 06:19:37