简体   繁体   English

AWS Glue Crawler glob 排除模式功能

[英]AWS Glue Crawler glob Exclude Pattern functionality

We need to ignore a few paths while crawling through a specific path.在通过特定路径爬行时,我们需要忽略一些路径。 Below are the details:以下是详细信息:

Include Path: s3://dev-bronze/api/sp/reports/xyz/
Exclude Path: brand=abc/client=xxx/**

Full path : "s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/"完整路径:“s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/”

We want to ignore a few client's data.我们想忽略一些客户的数据。 So I am using the above glob but it doesn't seem to work.所以我正在使用上面的 glob,但它似乎不起作用。 Any help will be highly appreciated.任何帮助将不胜感激。

Clarifying the difference between exclude patterns brand=abc/client=xxx/** and brand=abc/client=xxx** (note the missing / ).澄清排除模式brand=abc/client=xxx/**brand=abc/client=xxx**之间的区别(注意缺少的/ )。

Exclude pattern brand=abc/client=xxx/** matches:排除模式brand=abc/client=xxx/**匹配:

s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/<subfolder1>/file1.txt
s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/<subfolder2>/file2.txt

This pattern will match objects in all subfolders of brand=abc/client=xxx/ .此模式将匹配brand=abc/client=xxx/的所有子文件夹中的对象。

Exclude pattern brand=abc/client=xxx** matches:排除模式brand=abc/client=xxx**匹配:

s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/file1.txt
s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/file2.txt

This pattern will match all objects in brand=abc/client=xxx/ .此模式将匹配brand=abc/client=xxx/中的所有对象。

If you want to exclude files in brand=abc/client=xxx/ , then use the exclude pattern brand=abc/client=xxx** .如果要排除brand=abc/client=xxx/中的文件,请使用排除模式brand=abc/client=xxx**

Reference: Crawler Properties > Include and Exclude Patterns (AWS)参考: 爬虫属性 > 包含和排除模式 (AWS)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM