简体   繁体   English

在 Redshift COPY 命令中指定行分隔符

[英]Specify row delimiter in Redshift COPY command

I am trying to use the COPY command to import data into Redshift.我正在尝试使用 COPY 命令将数据导入 Redshift。 Unfortunately the data is not sanitized very well and there are CRLF characters in some of the data.不幸的是,数据没有很好地清理,并且某些数据中有 CRLF 字符。 This is causing an error because it thinks it is a new record.这会导致错误,因为它认为这是一条新记录。

I am already using the DELIMITER parameter, but that is setting the delimiter for the fields in each record.我已经在使用 DELIMITER 参数,但这是为每条记录中的字段设置分隔符。 Is there a similar way to specify what character(s) are delimiting each record?是否有类似的方法来指定分隔每条记录的字符?

No. Redshift expects \n (0x0A) as the End of Record (EOF) and doesn't handle CRLF (0x0D 0x0A).否。Redshift 期望 \n (0x0A) 作为记录结束 (EOF),并且不处理 CRLF (0x0D 0x0A)。 I believe it just sees the CR as another piece of input data but this info cannot be inserted into anything other than a varchar column.我相信它只是将 CR 视为另一段输入数据,但此信息不能插入 varchar 列以外的任何内容。 If you lines just have CR (0x0D) Redshift won't see an EOF at all and combine rows.如果您的行只有 CR (0x0D),Redshift 根本不会看到 EOF 并合并行。

You will need to cleanse your data to remove the CR characters.您将需要清理数据以删除 CR 字符。 Each record needs to end with a newline NL (0x0A).每条记录都需要以换行符 NL (0x0A) 结尾。 (Yes, LF and NL are the same ascii code and just have different names in different applications.) Hopefully you can just remove the CR's but I've seen data with just CR as the EOF and you will need to change these to NL in this case not just remove them. (是的,LF 和 NL 是相同的 ascii 代码,只是在不同的应用程序中有不同的名称。)希望你可以删除 CR,但我看到数据只有 CR 作为 EOF,你需要将它们更改为 NL这种情况下不只是删除它们。

If your last column of data is a varchar then you can (I believe) just strip the CR character from these strings after it is loaded into Redshift.如果您的最后一列数据是 varchar,那么您可以(我相信)在将这些字符串加载到 Redshift 后从这些字符串中删除 CR 字符。 Otherwise you data needs to be fix before it enters Redshift.否则,您的数据需要在进入 Redshift 之前进行修复。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM