简体繁体 English

AWS Glue Crawler glob 排除模式功能

[英]AWS Glue Crawler glob Exclude Pattern functionality

原文 2022-06-21 13:12:43 2 1 amazon-web-services/ aws-glue/ aws-glue-data-catalog

We need to ignore a few paths while crawling through a specific path.在通过特定路径爬行时，我们需要忽略一些路径。 Below are the details:以下是详细信息：

Include Path: s3://dev-bronze/api/sp/reports/xyz/
Exclude Path: brand=abc/client=xxx/**

Full path : "s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/"完整路径：“s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/”

We want to ignore a few client's data.我们想忽略一些客户的数据。 So I am using the above glob but it doesn't seem to work.所以我正在使用上面的 glob，但它似乎不起作用。 Any help will be highly appreciated.任何帮助将不胜感激。

1 个解决方案

Clarifying the difference between exclude patterns brand=abc/client=xxx/** and brand=abc/client=xxx** (note the missing / ).澄清排除模式brand=abc/client=xxx/**和brand=abc/client=xxx**之间的区别（注意缺少的/ ）。

Exclude pattern brand=abc/client=xxx/** matches:排除模式brand=abc/client=xxx/**匹配：

s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/<subfolder1>/file1.txt
s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/<subfolder2>/file2.txt

This pattern will match objects in all subfolders of brand=abc/client=xxx/ .此模式将匹配brand=abc/client=xxx/的所有子文件夹中的对象。

Exclude pattern brand=abc/client=xxx** matches:排除模式brand=abc/client=xxx**匹配：

s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/file1.txt
s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/file2.txt

This pattern will match all objects in brand=abc/client=xxx/ .此模式将匹配brand=abc/client=xxx/中的所有对象。

If you want to exclude files in brand=abc/client=xxx/ , then use the exclude pattern brand=abc/client=xxx** .如果要排除brand=abc/client=xxx/中的文件，请使用排除模式brand=abc/client=xxx** 。

Reference: Crawler Properties > Include and Exclude Patterns (AWS)参考：爬虫属性 > 包含和排除模式 (AWS)

AWS Glue 爬虫查询 - AWS Glue Crawler query

AWS Glue 爬虫问题 - AWS Glue Crawler issue

创建或替换 AWS Glue 爬网程序 - Create or Replace AWS Glue Crawler

AWS MWAA：胶水爬虫问题 - AWS MWAA: Glue Crawler issue

AWS Glue 爬虫未创建表 - AWS Glue Crawler Not Creating Table

aws 上胶水爬虫的更新时间表 - update schedule of a glue crawler on aws

AWS Glue 爬虫在排除模式条件下添加分区 - AWS Glue Crawlers add partitions within exclude pattern conditions

如何为 AWS Glue 中的爬网程序排除具有特定年份的特定文件夹？ - How can I exclude specific folders with a specific year for the crawler in AWS Glue?

AWS数据管道触发AWS Glue爬虫 - Aws data pipeline trigger aws glue crawler

AWS Glue Crawler创建分区和文件表 - AWS Glue Crawler Creates Partition and File Tables

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 AWS Glue 爬虫查询 - AWS Glue Crawler query AWS Glue 爬虫问题 - AWS Glue Crawler issue 创建或替换 AWS Glue 爬网程序 - Create or Replace AWS Glue Crawler AWS MWAA：胶水爬虫问题 - AWS MWAA: Glue Crawler issue AWS Glue 爬虫未创建表 - AWS Glue Crawler Not Creating Table aws 上胶水爬虫的更新时间表 - update schedule of a glue crawler on aws AWS Glue 爬虫在排除模式条件下添加分区 - AWS Glue Crawlers add partitions within exclude pattern conditions 如何为 AWS Glue 中的爬网程序排除具有特定年份的特定文件夹？ - How can I exclude specific folders with a specific year for the crawler in AWS Glue? AWS数据管道触发AWS Glue爬虫 - Aws data pipeline trigger aws glue crawler AWS Glue Crawler创建分区和文件表 - AWS Glue Crawler Creates Partition and File Tables

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM