簡體 English 中英

創建多個表的膠水爬蟲

[英]Glue crawler creating multiple tables

原文 2022-10-05 12:28:25 0 1 amazon-web-services/ aws-glue/ aws-glue-data-catalog

我有 2 個 S3 存儲桶，格式如下：

s3://bucket/{lob_name_1}/{table_name}/{current_date}/table_name.csv

s3://bucket/{lob_name_2}/{table_name}/{current_date}/table_name.csv

我們有屬於 2 個不同 LOB 的相同表名。 我們有一個 AWS Glue 爬蟲，每個爬蟲都用於一個 LOB。 當爬網程序針對第一個 LOB 運行時，將按預期創建表。 當爬網程序針對第二個 LOB 運行時，LOB 1 和 LOB 2 之間的公共表將使用不同的名稱重新創建。 有沒有一種方法可以防止在第二個 LOB 的爬蟲運行時創建附加表？

1 個解決方案

您應該使用一個參數來解決您的問題

為每個 S3 路徑創建一個模式：true

配置選項

數據存儲中的架構更新：忽略更改並且不更新數據目錄中的表。

從表中繼承架構：使用表中的元數據更新所有新的和現有的分區。

Object 數據存儲中的刪除：忽略更改，不更新數據目錄中的表。

AWS Glue 爬蟲問題

[英]AWS Glue Crawler issue

AWS Glue - 一個作業中的多個 RDS 表

[英]AWS Glue - multiple RDS tables in one job

AWS Athena 從 GLUE Crawler 輸入的表中返回零記錄來自 S3

[英]AWS Athena Return Zero Records from Tables Created by GLUE Crawler input csv from S3

aws 上膠水爬蟲的更新時間表

[英]update schedule of a glue crawler on aws

步驟 function 掛在膠履帶上的步驟

[英]Step function hanging on glue crawler step

Glue Crawler：目標收到的唯一事件數為0

[英]Glue Crawler: The number of unique events received is 0 for the target

AWS Glue Crawler 無法解析大文件（分類未知）

[英]AWS Glue Crawler cannot parse large files (classification UNKNOWN)

Glue 爬蟲無法分類大小 > 20 mb 的 JSON 數據

[英]Glue crawler could not classify JSON data of size > 20 mb

AWS Glue Crawler - 僅爬取新文件夾 - 內部服務異常

[英]AWS Glue Crawler - Crawl new folders only - Internal Service Exception

無法從單個存儲桶在 AWS glue 中設置多個表

[英]Having trouble setting up multiple tables in AWS glue from a single bucket

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 AWS Glue 爬蟲問題 AWS Glue - 一個作業中的多個 RDS 表 AWS Athena 從 GLUE Crawler 輸入的表中返回零記錄來自 S3 aws 上膠水爬蟲的更新時間表步驟 function 掛在膠履帶上的步驟 Glue Crawler：目標收到的唯一事件數為0 AWS Glue Crawler 無法解析大文件（分類未知） Glue 爬蟲無法分類大小 > 20 mb 的 JSON 數據 AWS Glue Crawler - 僅爬取新文件夾 - 內部服務異常無法從單個存儲桶在 AWS glue 中設置多個表

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM