无法从 Jupyter 笔记本访问共享 Glue 表

Question

我正在使用 Jupyter 笔记本从共享 Glue 表中读取数据（使用 LakeFormation）。 我正在使用awswrangler库。 我能够从示例数据库中读取测试表。 请注意，下面提到的local_db数据库是在我运行此查询的同一 AWS 账户中本地创建的。 lf_shared_table表是共享表的资源链接。

import awswrangler as wr
df = wr.athena.read_sql_query(sql="SELECT * FROM lf_shared_table", database="local_db")

错误

~/anaconda3/envs/python3/lib/python3.6/site-packages/awswrangler/_config.py in wrapper(*args_raw, **kwargs)
    437                 del args[name]
    438                 args = {**args, **keywords}
--> 439         return function(**args)
    440 
    441     wrapper.__doc__ = _inject_config_doc(doc=function.__doc__, available_configs=available_configs)

~/anaconda3/envs/python3/lib/python3.6/site-packages/awswrangler/athena/_read.py in read_sql_query(sql, database, ctas_approach, categories, chunksize, s3_output, workgroup, encryption, kms_key, keep_files, ctas_database_name, ctas_temp_table_name, ctas_bucketing_info, use_threads, boto3_session, max_cache_seconds, max_cache_query_inspections, max_remote_cache_entries, max_local_cache_entries, data_source, params, s3_additional_kwargs, pyarrow_additional_kwargs)
    878         s3_additional_kwargs=s3_additional_kwargs,
    879         boto3_session=session,
--> 880         pyarrow_additional_kwargs=pyarrow_additional_kwargs,
    881     )
    882 

~/anaconda3/envs/python3/lib/python3.6/site-packages/awswrangler/athena/_read.py in _resolve_query_without_cache(sql, database, data_source, ctas_approach, categories, chunksize, s3_output, workgroup, encryption, kms_key, keep_files, ctas_database_name, ctas_temp_table_name, ctas_bucketing_info, use_threads, s3_additional_kwargs, boto3_session, pyarrow_additional_kwargs)
    593         use_threads=use_threads,
    594         s3_additional_kwargs=s3_additional_kwargs,
--> 595         boto3_session=boto3_session,
    596     )
    597 

~/anaconda3/envs/python3/lib/python3.6/site-packages/awswrangler/athena/_read.py in _resolve_query_without_cache_regular(sql, database, data_source, s3_output, keep_files, chunksize, categories, encryption, workgroup, kms_key, wg_config, use_threads, s3_additional_kwargs, boto3_session)
    508         boto3_session=boto3_session,
    509         categories=categories,
--> 510         metadata_cache_manager=_cache_manager,
    511     )
    512     return _fetch_csv_result(

~/anaconda3/envs/python3/lib/python3.6/site-packages/awswrangler/athena/_utils.py in _get_query_metadata(query_execution_id, boto3_session, categories, query_execution_payload, metadata_cache_manager)
    259         _query_execution_payload: Dict[str, Any] = query_execution_payload
    260     else:
--> 261         _query_execution_payload = wait_query(query_execution_id=query_execution_id, boto3_session=boto3_session)
    262     cols_types: Dict[str, str] = get_query_columns_types(
    263         query_execution_id=query_execution_id, boto3_session=boto3_session

~/anaconda3/envs/python3/lib/python3.6/site-packages/awswrangler/athena/_utils.py in wait_query(query_execution_id, boto3_session)
    795     _logger.debug("StateChangeReason: %s", response["Status"].get("StateChangeReason"))
    796     if state == "FAILED":
--> 797         raise exceptions.QueryFailed(response["Status"].get("StateChangeReason"))
    798     if state == "CANCELLED":
    799         raise exceptions.QueryCancelled(response["Status"].get("StateChangeReason"))

QueryFailed: HIVE_METASTORE_ERROR: Required Table Storage Descriptor is not populated. (Service: null; Status Code: 0; Error Code: null; Request ID: null; Proxy: null)

任何的意见都将会有帮助。

Answer 1

我在查询共享表时遇到了同样的错误。 我们缺少 AWS LakeFormation 权限。 将lakeformation:GetDataAccess添加到 IAM 角色（除了胶水和 s3 权限）解决了这个问题。 长表：

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Lakeformation",
            "Effect": "Allow",
            "Action": [
                 "lakeformation:GetDataAccess",
                 "lakeformation:GrantPermissions"
             ],
            "Resource": "*"
        }
    ]
}

我不确定为什么这是必要的，但我相信对 Glue 数据目录（Hive Metastore）的访问是通过 Lake Formation 提供的。 当 athena 连接器无法访问元数据时，它会抛出这个有点误导的错误消息。 我用胶水权限搜索问题太久了。

可以在AWS Lake Formation 数据访问文档中找到更多背景信息。

无法从 Jupyter 笔记本访问共享 Glue 表

问题描述

1 个解决方案

解决方案1
0 2021-12-06 18:11:24

无法从 Jupyter 笔记本访问共享 Glue 表

问题描述

1 个解决方案

解决方案1 0 2021-12-06 18:11:24

解决方案1
0 2021-12-06 18:11:24