[英]AWS Glue Crawler glob Exclude Pattern functionality
We need to ignore a few paths while crawling through a specific path.在通过特定路径爬行时,我们需要忽略一些路径。 Below are the details:
以下是详细信息:
Include Path: s3://dev-bronze/api/sp/reports/xyz/
Exclude Path: brand=abc/client=xxx/**
Full path : "s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/"完整路径:“s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/”
We want to ignore a few client's data.我们想忽略一些客户的数据。 So I am using the above glob but it doesn't seem to work.
所以我正在使用上面的 glob,但它似乎不起作用。 Any help will be highly appreciated.
任何帮助将不胜感激。
Clarifying the difference between exclude patterns brand=abc/client=xxx/**
and brand=abc/client=xxx**
(note the missing /
).澄清排除模式
brand=abc/client=xxx/**
和brand=abc/client=xxx**
之间的区别(注意缺少的/
)。
Exclude pattern brand=abc/client=xxx/**
matches:排除模式
brand=abc/client=xxx/**
匹配:
s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/<subfolder1>/file1.txt
s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/<subfolder2>/file2.txt
This pattern will match objects in all subfolders of brand=abc/client=xxx/
.此模式将匹配
brand=abc/client=xxx/
的所有子文件夹中的对象。
Exclude pattern brand=abc/client=xxx**
matches:排除模式
brand=abc/client=xxx**
匹配:
s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/file1.txt
s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/file2.txt
This pattern will match all objects in brand=abc/client=xxx/
.此模式将匹配
brand=abc/client=xxx/
中的所有对象。
If you want to exclude files in brand=abc/client=xxx/
, then use the exclude pattern brand=abc/client=xxx**
.如果要排除
brand=abc/client=xxx/
中的文件,请使用排除模式brand=abc/client=xxx**
。
Reference: Crawler Properties > Include and Exclude Patterns (AWS)参考: 爬虫属性 > 包含和排除模式 (AWS)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.