简体   繁体   English

AWS 角色与 duckdb HTTPFS 调用中的 iam 凭证

[英]AWS role vs iam credential in duckdb HTTPFS call

I am pretty baffled and I don't know what is going on with this one.我很困惑,我不知道这个是怎么回事。

I'm using DuckDB to query parquet files in an s3 bucket.我正在使用 DuckDB 查询 s3 存储桶中的镶木地板文件。

import pandas as pd
import duckdb

query = """
    INSTALL httpfs;
    LOAD httpfs;
    SET s3_region='us-west-2';
    SET s3_access_key_id='key';
    SET s3_secret_access_key='secret';
    SELECT 
        FROM read_parquet('s3://bucket/folder/file.parquet') 

cursor = duckdb.connect()

cursor.execute(query).df()

I have an IAM user with admin access.我有一个具有管理员访问权限的 IAM 用户。 I am able to query this parquet file with programatic access keys.我能够使用编程访问密钥查询此镶木地板文件。 I also have a role that I want to use in an application that I have also given admin access just for testing purposes.我还有一个我想在应用程序中使用的角色,我还授予管理员访问权限只是为了测试目的。

When I assume the role and create temporary credentials and input those into the code above当我担任角色并创建临时凭证并将其输入到上面的代码中时

export $(printf "AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s" \
$(aws sts assume-role \
--role-arn arn:aws:iam::<account-id>:role/<role-name> \
--role-session-name test-session \
--query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]" \
--output text))

I get the error我得到错误

duckdb.Error: Invalid Error: Unable to connect to URL "s3://bucket/folder/file.parquet": 403 (Forbidden) duckdb.Error:无效错误:无法连接到 URL“s3://bucket/folder/file.parquet”:403(禁止访问)

However, when I use my IAM user, I am able to access this s3 object and query the data just fine.但是,当我使用我的 IAM 用户时,我可以访问这个 s3 对象并查询数据。 Is there something I am missing about the difference between roles and IAM users?关于角色和 IAM 用户之间的区别,我是否遗漏了什么?

If it helps, what I am trying to do is create a role for a lambda function and then access the environmental variables AWS_ACCESS_KEY_ID , and AWS_SECRET_ACCESS_KEY with os.getenviron() in the code above.如果有帮助,我想做的是为 lambda 函数创建一个角色,然后在上面的代码中使用os.getenviron()访问环境变量AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEY I believe if I can get the role working by writing in the temporary credentials it should work when I use os.getenv() in the lambda function.我相信如果我可以通过写入临时凭证来让角色工作,那么当我在 lambda 函数中使用os.getenv()时它应该可以工作。

I had a very similar issue, after also setting the s3_session_token via SET s3_session_token='sessiontoken';在通过SET s3_session_token='sessiontoken'; s3_session_token ,我遇到了一个非常相似的问题; it worked.有效。

The code would be changed to代码将更改为

import pandas as pd
import duckdb

query = """
    INSTALL httpfs;
    LOAD httpfs;
    SET s3_region='us-west-2';
    SET s3_access_key_id='key';
    SET s3_secret_access_key='secret';
    SET s3_session_token='session-token';
    SELECT 
        FROM read_parquet('s3://bucket/folder/file.parquet') 

cursor = duckdb.connect()

cursor.execute(query).df()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM