简体   繁体   English

AWS Redshift - 无法将外部表合并到本地目录

[英]AWS Redshift - Failed to incorporate external table into local catalog

Having a problem with one of our external tables in redshift.我们在 redshift 中的一个外部表有问题。

We have over 300 tables in AWS Glue which have been added to our redshift cluster as an external schema called events .我们在 AWS Glue 中有超过 300 个表,这些表已作为称为events的外部模式添加到我们的 redshift 集群中。 Most of the tables in events can be queries fine. events中的大多数表都可以很好地查询。 But when querying one of the tables called item_loaded we get the following error;但是当查询其中一个名为item_loaded的表时,我们会收到以下错误;

select * from events.item_loaded limit 1;
ERROR:  XX000: Failed to incorporate external table "events"."item_loaded" into local catalog.
LOCATION:  localize_external_table, /home/ec2-user/padb/src/external_catalog/external_catalog_api.cpp:358

What's weird is that they are in the catalog;奇怪的是它们在目录中;

select *
from SVV_EXTERNAL_TABLES
where tablename = 'item_loaded';

-[ RECORD 1 ]-----+------------------------------------------
schemaname        | events
tablename         | item_loaded
location          | s3://my_bucket/item_loaded
input_format      | org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
output_format     | org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
serialization_lib | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe 
serde_parameters  | {"serialization.format":"1"}
compressed        | 0
parameters        | {"EXTERNAL":"TRUE","parquet.compress":"SNAPPY","transient_lastDdlTime":"1504792238"}

AFAICT, this table is configured the exact same way as the other tables in the same schema which are working fine. AFAICT,此表的配置方式与同一架构中工作正常的其他表的配置方式完全相同。 I've tried recreating a new external schema pointing to the same AWS Glue database but the same issue occurs.我尝试重新创建指向同一个 AWS Glue 数据库的新外部架构,但出现了同样的问题。

What else could I potentially check?我还能检查什么? Is there anything that could occur which would cause a table to removed from the catalog?是否有任何可能导致表从目录中删除的情况?

As per the forum post about the same: 根据关于相同的论坛帖子:

The external table has a number of columns which exceed the Redshift limits:外部表有许多列超过了 Redshift 限制:

  • 1,600 columns per table for local Redshift table本地 Redshift 表每个表 1,600 列
  • 1,598 columns for Redshift Spectrum external table Redshift Spectrum 外部表有 1,598 列

You can verify the number of columns of external table by querying svv_external_columns您可以通过查询svv_external_columns来验证外部表的列数

I very recently faced the problem,我最近遇到了这个问题,

In addition to the above solution, there are a few more threads as well除了上面的解决方案,还有几个线程

  1. https://forums.aws.amazon.com/message.jspa?messageID=845538&tstart=0 (Solution by Joe) https://forums.aws.amazon.com/message.jspa?messageID=845538&tstart=0 (Joe 的解决方案)
  2. https://forums.aws.amazon.com/thread.jspa?messageID=780552 (Says the fix is incorporated) https://forums.aws.amazon.com/thread.jspa?messageID=780552 (表示包含修复程序)
  3. I was facing this issue with the IAM role having AWS Glue Full Access.我遇到了具有 AWS Glue 完全访问权限的 IAM 角色的问题。 I deliberately added AthenaFullAccess as well and restarted the Redshift cluster which resolved the issue.我也特意添加了 AthenaFullAccess 并重新启动了解决问题的 Redshift 集群。 Not sure what caused the issue and how it got resolved in this case不确定是什么原因导致了这个问题以及在这种情况下是如何解决的

it can also happen if there are typos in the config.如果配置中有拼写错误,也会发生这种情况。 for ex following fails:对于 ex 以下失败:

SECRET_ARN ' arn:aws:secretsmanager:us-east-1:123:secret:stage/data/redshift-rds'

and following works以及后续作品

SECRET_ARN 'arn:aws:secretsmanager:us-east-1:123:secret:stage/data/redshift-rds'

Note additional space at the beginning of arn注意 arn 开头的额外空间

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM