[英]AWS Data Pipeline RedShift “delimiter not found” error
I'm working on the data pipeline. 我正在研究数据管道。 In one of the steps CSV from S3 is consumed by RedShift DataNode. 在其中一个步骤中,来自S3的CSV由RedShift DataNode使用。 My RedShift table has 78 columns. 我的RedShift表有78列。 Checked with: 检查:
SELECT COUNT(*) FROM information_schema.columns WHERE table_name = 'my_table';
After failed RedshiftCopyActivity 'stl_load_errors' table shows "Delimiter not found" (1214) error for line number 1, for column namespace (this is second column, varchar(255)) on position 0. Consumed CSV line looks like that: 失败的RedshiftCopyActivity'stl_load_errors'表显示行号1的“Delimiter not found”(1214)错误,列名称空间(这是第二列,varchar(255))在位置0上。消费的CSV行看起来像这样:
0,my.namespace.string,2119652,458031,S,60,2015-05-02,2015-05-02 14:51:02,2015-05-02 14:51:14.0,1,Counter,1,Counter 01,91,Chaymae,0,,,,227817,1,Dine In,5788,2015-05-02 14:51:02,2015-05-02 14:51:27,17.45,0.00,0.00,17.45,,91,Chaymae,0,0.00,12,M,A,-1,13,F,0,0,2,2.50,F,1094055,Coleslaw Md Upt,8,Sonstige,900,Sides,901,Sides,0.00,0.00,0,,,0.0000,0,0,,,0.00,0.0000,0.0000,0,,,0.00,0.0000,,1,Woche Counter,127,Coleslaw Md Upt,2,2.50
After simple replacement ("," to "\\n") I have 78 lines so it looks like the data should be matched... I'm stuck on that. 在简单替换(“,”到“\\ n”)后,我有78行,所以看起来数据应该匹配...我坚持这一点。 Maybe someone knows how I can find more information about the error or see the solution? 也许有人知道如何找到有关错误的更多信息或查看解决方案?
EDIT 编辑
Query: 查询:
select d.query, substring(d.filename,14,20),
d.line_number as line,
substring(d.value,1,16) as value,
substring(le.err_reason,1,48) as err_reason
from stl_loaderror_detail d, stl_load_errors le
where d.query = le.query
and d.query = pg_last_copy_id();
results with 0 rows. 0行的结果。
I figured it out and maybe it will be useful for someone else: 我想出来了,也许对其他人有用:
There were in fact two problems. 事实上有两个问题。
INT IDENTITY(1,1)
and in CSV I had 0
value there. 我在redshift表中的第一个字段是INT IDENTITY(1,1)
类型,而在CSV中我有0
值。 After removing the first column from CSV, even without specified columns mapping everything was copied without a problem if... 从CSV中删除第一列后,即使没有指定的列映射,所有内容都被复制而没有问题,如果...... DELIMITER ','
commandOption was added to S3ToRedshiftCopyActivity to force using comma. DELIMITER ','
commandOption被添加到S3ToRedshiftCopyActivity以强制使用逗号。 Without it RedShift recognized dot from namespace (my.namespace.string) as delimiter. 没有它,RedShift将来自命名空间(my.namespace.string)的点识别为分隔符。 You need to add FORMAT AS JSON 's3://yourbucketname/aJsonPathFile.txt'. 您需要添加FORMAT AS JSON's3://yourbucketname/aJsonPathFile.txt'。 AWS has not mentioned this already. AWS尚未提及此问题。 Please note that this is only work when your data is in json form like 请注意,这仅适用于您的数据为json格式的数据
{'attr1': 'val1', 'attr2': 'val2'} {'attr1': 'val1', 'attr2': 'val2'} {'attr1': 'val1', 'attr2': 'val2'} {'attr1': 'val1', 'attr2': 'val2'} {'attr1':'val1','attr2':'val2'} {'attr1':'val1','attr2':'val2'} {'attr1':'val1','attr2':'val2' } {'attr1':'val1','attr2':'val2'}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.