[英]Upload CSVs of JSON data from S3 To Redshift
I have thousands of unusually formatted CSVs sitting in S3 that I need uploaded to Redshift.我有数千个格式异常的 CSV 文件位于 S3 中,我需要将它们上传到 Redshift。
The CSVs are formatted like so: CSV 的格式如下:
Column A Column B ..... Column Z
{"id": 2034823" "created": "2017-1-1" "result": true}
In other words, each row of the CSV is valid JSON.换句话说,CSV 的每一行都是有效的 JSON。
I've tried a simple copy command, but to no avail.我尝试了一个简单的复制命令,但无济于事。 I tried to add the
format as json 'auto';
我尝试将
format as json 'auto';
添加format as json 'auto';
flag, but still receiving errors:标志,但仍然收到错误:
Invalid Value: err_code 1216, line number 1, position 0
Is there a recommended way to handle CSVs in this format?是否有推荐的方法来处理这种格式的 CSV? I want to save them into an existing Redshift table that already has types defined
我想将它们保存到已经定义了类型的现有 Redshift 表中
I have the same exact types of files.我有完全相同的文件类型。 The steps I have followed to load them into a Redshift table like this
我遵循的步骤将它们加载到这样的 Redshift 表中
struct
struct
在 Redshift Spectrum 表中创建外部表 in your case在你的情况下
1.
CREATE EXTERNAL TABLE <spectrum schema>.<your external table>
(
data struct<
id:integer,
created:timestamp,
...
result:varchar(5)>
)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
with serdeproperties (
'dots.in.keys' = 'true',
'mapping.requesttime' = 'requesttimestamp')
as location 's3:<your S3 bucket>';
2.
INSERT INTO <your Redshift table>
SELECT data.id, data.created, ..., data.result
FROM <your external table>
See how to setup Redshift Spectrum https://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum.html查看如何设置 Redshift Spectrum https://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum.html
Let me know if you have further questions.如果您还有其他问题,请告诉我。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.