简体   繁体   English

Snowflake 中 COPY INTO 命令中的正则表达式

[英]Regular expression in COPY INTO command in Snowflake

I have few CSV files in azure blob storage, and we are using COPY INTO command to load the files in snowflake table.我在天蓝色 blob 存储中的 CSV 文件很少,我们正在使用 COPY INTO 命令加载雪花表中的文件。 The problem is: The file system is: container >> folder (Ex: account) >> Number of files like 2011-09.csv 2011-10.csv likewise and account folder also has a sub-folder 'Snapshot' which also has files that has similar data but with different name like 2019-11_1654478715.csv So while using COPY INTO command, the target table in Snowflake is populated with duplicate rows.问题是:文件系统是:容器>>文件夹(例如:帐户)>>类似 2011-09.csv 2011-10.csv 的文件数同样和帐户文件夹也有一个子文件夹“快照”,它也有具有相似数据但名称不同的文件,例如 2019-11_1654478715.csv 因此,在使用 COPY INTO 命令时,Snowflake 中的目标表会填充重复的行。

Iam using this one:我正在使用这个:

copy into BINGO_DWH_DEV.LANDING.CRM_ACCOUNT_TEMP from 'azure://abc.blob.core.windows.net/abc-abc/account' credentials=(azure_sas_token= 'abc') ON_ERROR='CONTINUE' FILE_FORMAT=(type=csv field_delimiter=',' FIELD_OPTIONALLY_ENCLOSED_BY='"');从 'azure://abc.blob.core.windows.net/abc-abc/account' 复制到 BINGO_DWH_DEV.LANDING.CRM_ACCOUNT_TEMP 凭据=(azure_sas_token= 'abc') ON_ERROR='CONTINUE' FILE_FORMAT=(type=csv field_delimiter =',' FIELD_OPTIONALLY_ENCLOSED_BY='"');

Any ideas where I can use COPY INTO command with regular expression that can pick only the files like '2011-09.csv' and not the files from the Snapshot folder.我可以使用带有正则表达式的 COPY INTO 命令的任何想法,该正则表达式只能选择像“2011-09.csv”这样的文件,而不是快照文件夹中的文件。

Appreciate your help感谢你的帮助

You can use pattern keyword as regular expressions to insert files based on pattern.您可以使用模式关键字作为正则表达式来插入基于模式的文件。

Please refer to the Snowflake documentation .请参阅雪花文档

Example:例子:

copy into emp_basic
  from @%emp_basic
  file_format = (type = csv field_optionally_enclosed_by='"')
  pattern = '.*2011-19.*.csv.gz'
  on_error = 'continue';

It depends on how you set the stage location (Azure blob or S3 or GCP).这取决于您如何设置阶段位置(Azure blob 或 S3 或 GCP)。 Let's say that your files get landed in the "folder" s3://yourbucket/folder1/[filename],gz .假设您的文件位于“文件夹” s3://yourbucket/folder1/[filename],gz中。 And that you've set your stage to point to s3://yourbucket used pattern:并且您已将阶段设置为指向s3://yourbucket使用的模式:

pattern='.*2011-09*.csv.*.gz'

Then it will scan all files under s3://yourbucket .然后它将扫描s3://yourbucket下的所有文件。

If however your stage has been setup to point to the folder s3://yourbucket/folder1/ and the pattern used is:但是,如果您的阶段已设置为指向文件夹s3://yourbucket/folder1/并且使用的模式是:

pattern='.*2011-09.*csv.*.gz'

Then it will look only in s3://yourbucket/folder1/ .然后它只会在s3://yourbucket/folder1/中查找。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM