简体   繁体   English

Snowflake - 并行处理大型 zip 文件并将其复制到 SF 表

[英]Snowflake - Parallel processing and copy of large zip file to a SF Table

We have a process to load the data from a csv to a Snowflake Table.我们有一个将数据从 csv 加载到雪花表的过程。 But as the input file is gzip format and after unzipping it is around 70 to 80 GB file.但由于输入文件是 gzip 格式,解压后大约有 70 到 80 GB 的文件。 At present the process is like reading the gzip file and direct insert into the staging table.目前这个过程就像读取 gzip 文件并直接插入到暂存表中一样。 But, with the medium cluster it actually running around 3 to 3:30 hrs time to complete.但是,对于中型集群,它实际上需要大约 3 到 3:30 的时间才能完成。 Need to understand if any parallelism can be handled here for a faster processing.需要了解是否可以在此处处理任何并行性以加快处理速度。

CREATE OR REPLACE FILE FORMAT MANGEMENT.TEST_GZIP_FORMAT TYPE = CSV FIELD_DELIMITER = ';' SKIP_HEADER = 2 ESCAPE_UNENCLOSED_FIELD = NONE TRIM_SPACE = TRUE;


INSERT INTO TEST_DB.TEMP_TABLE (
                        emp_id, emp_name ) SELECT DISTINCT temp.$1 as emp_id,
                        temp.$2  AS emp_name   from
                        /Azureserverlocation/test/apps/ (file_format => MANAGEMENT.TEST_GZIP_FORMAT, pattern=>'./test_file.gz') temp;

can you break your process into 2 stages and use copy into .你能把你的过程分成两个阶段并使用copy into .

  1. use COPY INTO to copy data into a stage table.使用COPY INTO将数据复制到阶段表中。
COPY INTO TEST_DB.TEMP_TABLE_STG 
from '/Azureserverlocation/test/apps/'
file_format = (format_name=MANAGEMENT.TEST_GZIP_FORMAT, pattern=>'./test_file.gz');
  1. Then get a distinct from stg table.然后得到一个与 stg distinct的表。
CREATE TABLE TEST_DB.TEMP_TABLE as 
SELECT DISTINCT emp_id, emp_name from TEST_DB.TEMP_TABLE_STG 

复制到<div id="text_translate"><p>我正在尝试将数据从本地复制到雪花,我得到了</p><blockquote><p> snowflake.connector.errors.ProgrammingError: 001757 (42601): SQL 编译错误:表 'RAW_DATA' 不存在</p></blockquote><p>相同的代码在 Jupiter notebook 中有效,但在 vs code 中无效。 我的角色是 accountadmin,所以权限没有问题。</p><p> 我要运行的代码是这个</p><pre>COPY INTO RAW_DATA file_format=(FIELD_OPTIONALLY_ENCLOSED_BY ='"' skip_header=1)</pre></div>在雪花抛出表中不存在<table> </table> - COPY INTO <table> in snowflake throws table does not exist

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将镶木地板文件从 Azure Blob 存储复制到雪花表中? - How to copy parquet file from Azure Blob Storage into Snowflake table? COPY INTO 带有额外列的雪花表 - COPY INTO Snowflake Table with Extra Columns 复制到<div id="text_translate"><p>我正在尝试将数据从本地复制到雪花,我得到了</p><blockquote><p> snowflake.connector.errors.ProgrammingError: 001757 (42601): SQL 编译错误:表 'RAW_DATA' 不存在</p></blockquote><p>相同的代码在 Jupiter notebook 中有效,但在 vs code 中无效。 我的角色是 accountadmin,所以权限没有问题。</p><p> 我要运行的代码是这个</p><pre>COPY INTO RAW_DATA file_format=(FIELD_OPTIONALLY_ENCLOSED_BY ='"' skip_header=1)</pre></div>在雪花抛出表中不存在<table> </table> - COPY INTO <table> in snowflake throws table does not exist 雪花复制到 Python 雪花连接器中没有错误的拒绝文件 - Snowflake Copy Into rejecting file without error in Python Snowflake Connector 处理大文本(JSON)文件 - Processing large text (JSON) file 如何批量复制具有聚集索引的表? - How to bulk copy in parallel to a table with a clustered index? 雪花中是否使用了并行提示 - Are parallel hints used in Snowflake 尝试将数据从视图复制到雪花表中时出错 - Errors trying to copy data from a view into a table in snowflake 将大型CSV文件并行导出到SQL Server - Export a large CSV file in parallel to SQL server 如何为每个分区生成一个文件 - Snowflake COPY into location - How to generate a single file per partition - Snowflake COPY into location
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM