简体   繁体   English

将数据加载到 Amazon Redshift Spectrum 时删除双引号 "

[英]Remove double quotes " while loading data to Amazon Redshift Spectrum

I want to load data to amazon redshift external table.我想将数据加载到 amazon redshift 外部表。 Data is in CSV format and has quotes.数据采用 CSV 格式并带有引号。 Do we have something like REMOVEQUOTES which we have in copy command for redshift external tables.我们是否有类似 REMOVEQUOTES 的东西,我们在 redshift 外部表的复制命令中拥有它。 Also what are different options to load fixed length data in external table.还有哪些不同的选项可以在外部表中加载固定长度数据。

To create an external Spectrum table, you should reference the CREATE TABLE syntax provided by Athena.要创建外部 Spectrum 表,您应该参考 Athena 提供的CREATE TABLE语法。 To load a CSV escaped by double quotes, you should use the following lines as your ROW FORMAT要加载由双引号转义的 CSV,您应该使用以下几行作为ROW FORMAT

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
    'separatorChar' = ',',
    'quoteChar' = '\"',
    'escapeChar' = '\\'
)

For fixed length files, you should use the RegexSerDe.对于固定长度的文件,您应该使用 RegexSerDe。 In this case, the relevant portion of your CREATE TABLE statement will look like this (assuming 3 fields of length 100).在这种情况下, CREATE TABLE语句的相关部分将如下所示(假设 3 个字段的长度为 100)。

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" = "(.{100})(.{100})(.{100})")

You can also use regex to parse data enclosed by multiple characters.您还可以使用正则表达式来解析由多个字符包围的数据。 Example (in CSV file, fields were surrounded by triple double-quotes (""")):示例(在 CSV 文件中,字段被三重双引号 (""") 包围):

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.AbstractSerDe'
WITH SERDEPROPERTIES (
    'input.regex' = "^\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*$"  ) 
) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM