簡體 English 中英

Aws Glue Crawler 在第一次爬網后沒有更新表

[英]Aws Glue Crawler is not updating the table after 1st crawl

原文 2021-08-13 16:24:48 6 1 amazon-web-services/ aws-glue-data-catalog

我正在添加一個由 Glue Databrew 在我的 S3 文件夾中創建的鑲木地板格式的新文件。 新文件與前一個文件具有相同的架構。 但是當我第二次運行 Crawler 時，它既沒有更新表也沒有在數據目錄中創建新表。 此外，當我同時抓取這兩個文件時，它們都會被添加。

日志文件提供以下信息：
信息：為表創建了值為 [[新文件名]] 的分區
BENCHMARK：完成寫入目錄

我嘗試過使用和不使用“為每個 S3 路徑創建一個模式”。 但是爬蟲沒有用新文件更新表。 很快我就會每天添加新文件來做我的分析。 任何解決方案？

1 個解決方案

在我看來，解決此問題的最佳方法是直接將 AWS DataBrew output 用於 Data Catalog。 Data Catalog 可以由爬蟲或 DataBrew 直接更新，但推薦的做法是您使用這些機制中的任何一種，而不是同時使用這兩種機制。

您可以嘗試使用 output 作為您的數據目錄運行該作業並讓 Databrew 管理您的目錄嗎？ 它應該使用正確的數據/文件更新您的目錄表。

AWS Glue Crawler - 僅爬取新文件夾 - 內部服務異常

[英]AWS Glue Crawler - Crawl new folders only - Internal Service Exception

AWS Glue 爬蟲問題

[英]AWS Glue Crawler issue

AWS Glue Crawler：想要 s3 中文件夾的單獨表

[英]AWS Glue Crawler: want separate table for folder in s3

AWS Glue 可以爬取 Delta Lake 表數據嗎？

[英]Can AWS Glue crawl Delta Lake table data?

aws 上膠水爬蟲的更新時間表

[英]update schedule of a glue crawler on aws

AWS Glue 爬蟲排除模式不起作用

[英]AWS Glue crawler exclude patterns not working

強制 Glue Crawler 使用預定義的 Glue Table

[英]Force Glue Crawler to use pre-defined Glue Table

AWS Glue Crawler 在沒有 Glue Job 的情況下將所有數據發送到 Glue Catalog 和 Athena

[英]AWS Glue Crawler sends all data to Glue Catalog and Athena without Glue Job

由於 IAM 權限，無法運行 AWS Glue Crawler

[英]Unable to run AWS Glue Crawler due to IAM Permissions

AWS Glue Crawler 無法解析大文件（分類未知）

[英]AWS Glue Crawler cannot parse large files (classification UNKNOWN)

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 AWS Glue Crawler - 僅爬取新文件夾 - 內部服務異常 AWS Glue 爬蟲問題 AWS Glue Crawler：想要 s3 中文件夾的單獨表 AWS Glue 可以爬取 Delta Lake 表數據嗎？ aws 上膠水爬蟲的更新時間表 AWS Glue 爬蟲排除模式不起作用強制 Glue Crawler 使用預定義的 Glue Table AWS Glue Crawler 在沒有 Glue Job 的情況下將所有數據發送到 Glue Catalog 和 Athena 由於 IAM 權限，無法運行 AWS Glue Crawler AWS Glue Crawler 無法解析大文件（分類未知）

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM