[英]Remove double quotes " while loading data to Amazon Redshift Spectrum
I want to load data to amazon redshift external table.我想将数据加载到 amazon redshift 外部表。 Data is in CSV format and has quotes.数据采用 CSV 格式并带有引号。 Do we have something like REMOVEQUOTES which we have in copy command for redshift external tables.我们是否有类似 REMOVEQUOTES 的东西,我们在 redshift 外部表的复制命令中拥有它。 Also what are different options to load fixed length data in external table.还有哪些不同的选项可以在外部表中加载固定长度数据。
To create an external Spectrum table, you should reference the CREATE TABLE
syntax provided by Athena.要创建外部 Spectrum 表,您应该参考 Athena 提供的CREATE TABLE
语法。 To load a CSV escaped by double quotes, you should use the following lines as your ROW FORMAT
要加载由双引号转义的 CSV,您应该使用以下几行作为ROW FORMAT
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar' = ',',
'quoteChar' = '\"',
'escapeChar' = '\\'
)
For fixed length files, you should use the RegexSerDe.对于固定长度的文件,您应该使用 RegexSerDe。 In this case, the relevant portion of your CREATE TABLE
statement will look like this (assuming 3 fields of length 100).在这种情况下, CREATE TABLE
语句的相关部分将如下所示(假设 3 个字段的长度为 100)。
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" = "(.{100})(.{100})(.{100})")
You can also use regex to parse data enclosed by multiple characters.您还可以使用正则表达式来解析由多个字符包围的数据。 Example (in CSV file, fields were surrounded by triple double-quotes (""")):示例(在 CSV 文件中,字段被三重双引号 (""") 包围):
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.AbstractSerDe'
WITH SERDEPROPERTIES (
'input.regex' = "^\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*$" )
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.