简体   繁体   English

将带有空值和空字符串的雪花表复制到可以使用 psql copy 命令导入的 csv

[英]Copying a snowflake table with nulls and empty strings to csv that can be imported with psql copy command

So, if you have this table in Snowflake:所以,如果你在 Snowflake 中有这张表:

create table t (x string, y string) as select '', null;

and you copy it to an external stage with file_format csv, you get this error if you don't set field_optionally_enclosed_by to something other than none:并使用 file_format csv 将其复制到外部阶段,如果未将 field_optionally_enclosed_by 设置为 none 以外的其他值,则会出现此错误:

Cannot unload empty string without file format option field_optionally_enclosed_by being specified.如果没有指定文件格式选项 field_optionally_enclosed_by,则无法卸载空字符串。

so, let's say it's set to '"'.所以,假设它设置为 '"'。

create stage some_stg
url='s3://<some-bucket>/<some-dir>'
file_format = (type = csv field_optionally_enclosed_by='"' compression = none)
credentials = (aws_role = '<your-arn-for-snowflake>')

I'm sure this issue reproduces with an internal stage if you don't want to mess with getting snowflake to use your s3 bucket.如果您不想让雪花使用您的 s3 存储桶,我相信这个问题会在内部阶段重现。

When you run a copy for table t above:当您为上面的表 t 运行副本时:

copy into @some_stg/t.csv from t overwrite = true;

you get a file (t_0_0_0.csv) that looks like this:您会得到一个如下所示的文件 (t_0_0_0.csv):

"","\\N"

And after creating the equivalent table in postgres:在 postgres 中创建等效表之后:

create table t (x varchar, y varchar);

When you load that into postgres with psql copy like this:当您使用 psql 副本将其加载到 postgres 时,如下所示:

psql -h <host> -U <user> -c "copy t from stdin with csv null '\\N'" < t_0_0_0.csv

The contents of t on postgres is: postgres上t的内容是:

x, y
"","\N"

Now this make sense because snowflake put the \\N in double quotes, so the psql copy preserved it.现在这是有道理的,因为雪花将 \\N 放在双引号中,所以 psql 副本保留了它。 If you edit t_0_0_0.csv and remove the double quotes around the \\N:如果您编辑 t_0_0_0.csv 并删除 \\N 周围的双引号:

"",\\N

And run psql copy again then the \\N is correctly converted to null并再次运行 psql copy 然后 \\N 正确转换为 null

There does not appear to be a way to generate a csv file from snowflake that supports empty string and null that can be preserved loading into postgres.似乎没有一种方法可以从支持空字符串和 null 的雪花生成 csv 文件,该文件可以保留加载到 postgres 中。 I messed with the snowflake configs EMPTY_FIELD_AS_NULL and NULL_IF which in snowflake's documentation it even speaks to this issue:我弄乱了雪花配置 EMPTY_FIELD_AS_NULL 和 NULL_IF 在雪花的文档中它甚至谈到了这个问题:

When unloading empty string data from tables, choose one of the following options:

Preferred: Enclose strings in quotes by setting the FIELD_OPTIONALLY_ENCLOSED_BY option, to distinguish empty strings from NULLs in output CSV files.

It does "distinguish" them but not in a way that psql copy can use without manipulating the file with sed beforehand.它确实“区分”了它们,但不是 psql copy 可以使用而无需事先用 sed 操作文件的方式。

Does anyone know how to generate a snowflake csv that preserves empty strings and nulls in a way that psql copy can reproduce?有谁知道如何生成雪花 csv 以 psql 副本可以重现的方式保留空字符串和空值?

Did you tried with NULL_IF option in your file format, Following file format will unload your snowflake null data with empty.您是否尝试过在您的文件格式中使用 NULL_IF 选项,以下文件格式将卸载您的雪花空数据。

CREATE OR REPLACE FILE FORMAT UPDATED_FORMAT_NAME
TYPE = 'CSV'
COMPRESSION = 'NONE'
FIELD_DELIMITER =','
NULL_IF=()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM