[英]Loading quoted numbers into snowflake table from CSV with COPY TO <TABLE>
I have a problem with loading CSV data into snowflake table.我在将 CSV 数据加载到雪花表时遇到问题。 Fields are wrapped in double quote marks and hence there is problem with importing them into table.
字段用双引号括起来,因此将它们导入表中存在问题。
I know that COPY TO has CSV specific option FIELD_OPTIONALLY_ENCLOSED_BY = '"'but it's not working at all.我知道 COPY TO 具有 CSV 特定选项 FIELD_OPTIONALLY_ENCLOSED_BY = '"' 但它根本不起作用。
Here are some pices of table definition and copy command:以下是一些表定义和复制命令的图片:
CREATE TABLE ...
(
GamePlayId NUMBER NOT NULL,
etc...
....);
COPY INTO ...
FROM ...csv.gz'
FILE_FORMAT = (TYPE = CSV
STRIP_NULL_VALUES = TRUE
FIELD_DELIMITER = ','
SKIP_HEADER = 1
error_on_column_count_mismatch=false
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
ON_ERROR = "ABORT_STATEMENT"
;
Csv file looks like this: Csv 文件如下所示:
"3922000","14733370","57256","2","3","2","2","2019-05-23 14:14:44",",00000000",",00000000",",00000000",",00000000","1000,00000000","1000,00000000","1317,50400000","1166,50000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000"
I get an error我收到一个错误
'''Numeric value '"3922000"' is not recognized '''
I'm pretty sure it's because NUMBER value is interpreted as string when snowflake is reading "" marks, but since I use我很确定这是因为当雪花正在读取“”标记时,NUMBER 值被解释为字符串,但是因为我使用
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
it shouldn't even be there... Does anyone have some solution to this?它甚至不应该在那里......有没有人对此有一些解决方案?
Maybe something is incorrect with your file?也许您的文件有问题? I was just able to run the following without issue.
我只能毫无问题地运行以下命令。
1. create the test table:
CREATE OR REPLACE TABLE
dbNameHere.schemaNameHere.stacko_58322339 (
num1 NUMBER,
num2 NUMBER,
num3 NUMBER);
2. create test file, contents as follows
1,2,3
"3922000","14733370","57256"
3,"2",1
4,5,"6"
3. create stage and put file in stage
4. run the following copy command
COPY INTO dbNameHere.schemaNameHere.STACKO_58322339
FROM @stageNameHere/stacko_58322339.csv.gz
FILE_FORMAT = (TYPE = CSV
STRIP_NULL_VALUES = TRUE
FIELD_DELIMITER = ','
SKIP_HEADER = 0
ERROR_ON_COLUMN_COUNT_MISMATCH=FALSE
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
ON_ERROR = "CONTINUE";
4. results
+-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| stageNameHere/stacko_58322339.csv.gz | LOADED | 4 | 4 | 4 | 0 | NULL | NULL | NULL | NULL |
+-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
1 Row(s) produced. Time Elapsed: 2.436s
5. view the records
>SELECT * FROM dbNameHere.schemaNameHere.stacko_58322339;
+---------+----------+-------+
| NUM1 | NUM2 | NUM3 |
|---------+----------+-------|
| 1 | 2 | 3 |
| 3922000 | 14733370 | 57256 |
| 3 | 2 | 1 |
| 4 | 5 | 6 |
+---------+----------+-------+
Can you try with a similar test as this?您可以尝试与此类似的测试吗?
EDIT: A quick look at your data shows many of your numeric fields appear to start with commas, so something definitely amiss with the data.编辑:快速查看您的数据显示您的许多数字字段似乎以逗号开头,因此数据肯定有问题。
Assuming your numbers are European formatted ,
decimal place, and .
假设您的数字是欧洲格式
,
小数点和.
thousands, reading the numeric formating help, it seems Snowflake does not support this as input.数千,阅读数字格式帮助,似乎雪花不支持这个作为输入。 I'd open a feature request.
我会打开一个功能请求。
But if you read the column in as text
then use REPLACE like但是,如果您以
text
形式阅读该列,则使用REPLACE之类的
SELECT '100,1234'::text as A
,REPLACE(A,',','.') as B
,TRY_TO_DECIMAL(b, 20,10 ) as C;
gives:给出:
A B C
100,1234 100.1234 100.1234000000
safer would be to strip placeholders first like更安全的是首先剥离占位符
SELECT '1.100,1234'::text as A
,REPLACE(A,'.') as B
,REPLACE(B,',','.') as C
,TRY_TO_DECIMAL(C, 20,10 ) as D;
thanks for help.感谢帮助。 It turned out that the file was UTF16 coded.
原来,该文件是 UTF16 编码的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.