简体   繁体   English

AWS Data Pipeline RedShift“未找到分隔符”错误

[英]AWS Data Pipeline RedShift “delimiter not found” error

I'm working on the data pipeline. 我正在研究数据管道。 In one of the steps CSV from S3 is consumed by RedShift DataNode. 在其中一个步骤中,来自S3的CSV由RedShift DataNode使用。 My RedShift table has 78 columns. 我的RedShift表有78列。 Checked with: 检查:

SELECT COUNT(*) FROM information_schema.columns WHERE table_name = 'my_table';

After failed RedshiftCopyActivity 'stl_load_errors' table shows "Delimiter not found" (1214) error for line number 1, for column namespace (this is second column, varchar(255)) on position 0. Consumed CSV line looks like that: 失败的RedshiftCopyActivity'stl_load_errors'表显示行号1的“Delimiter not found”(1214)错误,列名称空间(这是第二列,varchar(255))在位置0上。消费的CSV行看起来像这样:

0,my.namespace.string,2119652,458031,S,60,2015-05-02,2015-05-02 14:51:02,2015-05-02 14:51:14.0,1,Counter,1,Counter 01,91,Chaymae,0,,,,227817,1,Dine In,5788,2015-05-02 14:51:02,2015-05-02 14:51:27,17.45,0.00,0.00,17.45,,91,Chaymae,0,0.00,12,M,A,-1,13,F,0,0,2,2.50,F,1094055,Coleslaw Md Upt,8,Sonstige,900,Sides,901,Sides,0.00,0.00,0,,,0.0000,0,0,,,0.00,0.0000,0.0000,0,,,0.00,0.0000,,1,Woche Counter,127,Coleslaw Md Upt,2,2.50

After simple replacement ("," to "\\n") I have 78 lines so it looks like the data should be matched... I'm stuck on that. 在简单替换(“,”到“\\ n”)后,我有78行,所以看起来数据应该匹配...我坚持这一点。 Maybe someone knows how I can find more information about the error or see the solution? 也许有人知道如何找到有关错误的更多信息或查看解决方案?

EDIT 编辑

Query: 查询:

select d.query, substring(d.filename,14,20), 
d.line_number as line, 
substring(d.value,1,16) as value,
substring(le.err_reason,1,48) as err_reason
from stl_loaderror_detail d, stl_load_errors le
where d.query = le.query
and d.query = pg_last_copy_id(); 

results with 0 rows. 0行的结果。

I figured it out and maybe it will be useful for someone else: 我想出来了,也许对其他人有用:

There were in fact two problems. 事实上有两个问题。

  1. My first field in the redshift table was of the type INT IDENTITY(1,1) and in CSV I had 0 value there. 我在redshift表中的第一个字段是INT IDENTITY(1,1)类型,而在CSV中我有0值。 After removing the first column from CSV, even without specified columns mapping everything was copied without a problem if... 从CSV中删除第一列后,即使没有指定的列映射,所有内容都被复制而没有问题,如果......
  2. DELIMITER ',' commandOption was added to S3ToRedshiftCopyActivity to force using comma. DELIMITER ',' commandOption被添加到S3ToRedshiftCopyActivity以强制使用逗号。 Without it RedShift recognized dot from namespace (my.namespace.string) as delimiter. 没有它,RedShift将来自命名空间(my.namespace.string)的点识别为分隔符。

You need to add FORMAT AS JSON 's3://yourbucketname/aJsonPathFile.txt'. 您需要添加FORMAT AS JSON's3://yourbucketname/aJsonPathFile.txt'。 AWS has not mentioned this already. AWS尚未提及此问题。 Please note that this is only work when your data is in json form like 请注意,这仅适用于您的数据为json格式的数据

{'attr1': 'val1', 'attr2': 'val2'} {'attr1': 'val1', 'attr2': 'val2'} {'attr1': 'val1', 'attr2': 'val2'} {'attr1': 'val1', 'attr2': 'val2'} {'attr1':'val1','attr2':'val2'} {'attr1':'val1','attr2':'val2'} {'attr1':'val1','attr2':'val2' } {'attr1':'val1','attr2':'val2'}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 数据字符串中具有分隔符/特殊字符的 Redshift 卸载命令 - Redshift Unload Command having delimiter/Special Characters in Data String 使用NaN将CSV数据加载到AWS Redshift中 - Loading CSV data with NaN into AWS Redshift SSIS 错误找不到列“x”的列分隔符 - SSIS error The column delimiter for column "x" was not found 使用 COPY 命令的 Redshift 错误 1202“找到额外的列” - Redshift Error 1202 "Extra column(s) found" using COPY command AWS数据管道-如何将其用于增量RDS数据更新? - AWS Data pipeline - how to use it for incremental RDS data updates? 使用AWS Data Pipeline在CSV / TSV文件中创建列标题? - Creating column headers in CSV/TSV files using AWS Data Pipeline? AWS Data Pipeline将CSV从S3复制到RDS MySQL - AWS Data Pipeline to copy CSV from S3 to RDS MySQL 多个双引号和反斜杠引发 SQL 错误:“未找到列的列分隔符” - Multiple double quotes and back slashes throwing SQL error: "The column delimiter for column was not found" 将时间戳数据从 CSV 插入时间戳数据类型的 Redshift 表列时出错 - Error when inserting timestamp data from CSV into a Redshift table column which is of timestamp data type csv 数据错误的分区记录处理器。 封装的令牌和分隔符之间的字符无效 - Partition record processor on csv data error. Invalid char between encapsulated token and delimiter
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM