简体   繁体   English

将数据从 Blob 加载到 Snowflake 表

[英]Loading data from Blob to Snowflake table

I have a.txt file and 1st row in the file is column name.我有一个 .txt 文件,文件中的第一行是列名。 I want to load this data to Snowflake table.我想将此数据加载到雪花表。

1st think How can run a select Statement to see all the columns from file using *.第一个想法是如何运行 select 语句以使用 * 来查看文件中的所有列。 I don't want write t.$1, t.$2, .... ect.我不想写 t.$1, t.$2, .... 等等。

Something Similar to SELECT t.类似于SELECT 的东西。 FROM '@azure_blob_stage_poc/Dim_Date.txt' AS t ORDER BY 1;*从 '@azure_blob_stage_poc/Dim_Date.txt' 按 1 排序;*

Also When Loading data to Table I have to Ignore 1st Row from file as It contains Column Names.此外,当将数据加载到表时,我必须忽略文件中的第一行,因为它包含列名。 I need Snow flake script similar to COPY INTO POC.Dim_Date FROM '@azure_blob_stage_poc/Dim_Date.txt';我需要类似于COPY INTO POC.Dim_Date FROM '@azure_blob_stage_poc/Dim_Date.txt' 的雪花脚本;

If I don;t ignore 1st Row and try to load getting error message: "Field delimiter ',' found while expecting record delimiter '\n' File 'Dim_Date.txt', line 2, character 547 Row 2, column "DIM_DATE"["LOAD_DT":55] If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client."如果我不忽略第一行并尝试加载获取错误消息: “字段分隔符”,在期待记录分隔符“\n”时找到文件“Dim_Date.txt”,第 2 行,字符 547 第 2 行,列“DIM_DATE” ["LOAD_DT":55] 如果您想在遇到错误时继续加载,请为 ON_ERROR 选项使用其他值,例如 'SKIP_FILE' 或 'CONTINUE'。有关加载选项的更多信息,请运行 'info loading_data'在 SQL 客户端中。”

Please read the documentation on the COPY INTO <table> command.请阅读有关COPY INTO <table>命令的文档。

The CSV section has a parameter called SKIP_HEADER that can be used to skip the header line. CSV部分有一个名为SKIP_HEADER的参数,可用于跳过 header 行。

Your "question" contains less of an actual question, but there is a "How" in there somewhere related to text file discovery.您的“问题”包含的实际问题较少,但在与文本文件发现相关的某处有一个“如何”。 Normally this is what an ETL/integration tool does for you, but the obvious thing is to look at the file in an text editor.通常这是 ETL/集成工具为您做的事情,但显而易见的是在文本编辑器中查看文件。

I'll get the list of columns by reading a complete record as a single field and split using SPLIT_TO_TABLE() ::我将通过将完整记录作为单个字段读取并使用SPLIT_TO_TABLE() :: 拆分来获取列列表

CREATE OR REPLACE STAGE my_stage URL = 's3://<bucket>[/<path>/' CREDENTIALS = ( ... );
CREATE OR REPLACE FILE FORMAT TEST_TXT TYPE = CSV FIELD_DELIMITER = NONE;

SELECT
  LISTAGG('$'||INDEX||' "'||TRIM(VALUE, '"')||'"', ', ') WITHIN GROUP (ORDER BY INDEX) COLS
FROM '@my_stage/my_file' (FILE_FORMAT => 'TEST_TXT') x
CROSS JOIN LATERAL SPLIT_TO_TABLE(x.$1, ',') s
GROUP BY SEQ HAVING SEQ = 1;
    =>
$1 "Order date", $2 "Item code", $3 "Quantity"

Then I just copy the result COLS into a new SELECT using a new FILE FORMAT :然后我只是使用新的FILE FORMAT将结果COLS复制到新的SELECT中:

CREATE OR REPLACE FILE FORMAT TEST_TXT2
    TYPE = CSV SKIP_HEADER = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '"';

SELECT $1 "Order date", $2 "Item code", $3 "Quantity"
FROM '@my_stage/my_file' (FILE_FORMAT => 'TEST_TXT2') x;

The special SQL construct * for column names only works for named record sets.列名的特殊 SQL 构造*仅适用于命名记录集。 There is no way to convert data content to SQL column names.无法将数据内容转换为 SQL 列名。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM