通过 Athena 创建 Glue 数据目录 SDK

Question

I would like to use Athena to run queries on data in an S3 bucket in another AWS account.我想使用 Athena 对另一个 AWS 账户中 S3 存储桶中的数据运行查询。 I am using Javascript SDK. Reading through the documentation , I understand that I must first create a data catalog that will point Athena to the correct S3 location.我正在使用 Javascript SDK。通过阅读文档，我明白我必须首先创建一个数据目录，将 Athena 指向正确的 S3 位置。

I think that I have to call the createDataCatalog method.我认为我必须调用createDataCatalog方法。 Most of the arguments for this method are self-explanatory, except for the "parameters" argument, which seems to contain information about how the data catalog will be created.此方法的大部分 arguments 都是不言自明的，除了“参数”参数，它似乎包含有关如何创建数据目录的信息。 But I am unable to find anywhere how these parameters should look.但我无法在任何地方找到这些参数的外观。

So my questions are:所以我的问题是：

What are the parameters to provide to here?提供给这里的参数是什么？
Is this the right way to create a glue data catalog (including database and table)?这是创建粘合数据目录（包括数据库和表）的正确方法吗？
Once done, will this allow me to run Athena queries on the data catalog?完成后，这是否允许我在数据目录上运行 Athena 查询？

Answer 1

For a simple use case with static S3 data,对于具有 static S3 数据的简单用例，

We first need to create Glue Table using Glue createTable API pointing to S3 location.我们首先需要使用指向 S3 位置的 Glue createTable API 创建 Glue Table。 Few Examples in cli documentation . cli 文档中的几个示例。
Run queries against this Glue Table from Athena从Athena对这个 Glue Table 运行查询

Here is an example to create Glue Database and Table下面是创建 Glue 数据库和表的示例

const AWS = require("aws-sdk");
AWS.config.update({ region: "us-east-1" });

const glue = new AWS.Glue();
const dbName = "test-db";
glue.createDatabase(
  {
    DatabaseInput: {
      Name: dbName,
    },
  },
  function (dbCrtErr, dbRsp) {
    if (dbCrtErr.message === "Database already exists." || dbRsp) {
      console.log("dbRsp", dbRsp);
      glue.createTable(
        {
          DatabaseName: dbName,
          TableInput: {
            Name: "my-table",
            Parameters: {
              classification: "json",
              compressionType: "none",
            },
            TableType: "EXTERNAL_TABLE",
            StorageDescriptor: {
              Location: "s3://my-s3-bucket-with-events/",
              InputFormat:
                "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
              OutputFormat:
                "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
              Columns: [
                {
                  Name: "id",
                  Type: "string",
                },
                {
                  Name: "name",
                  Type: "string",
                },
              ],
            },
          },
        },
        function (error, response) {
          console.log("error", error, "response", response);
        }
      );
    } else {
      console.log("dbCrtErr", dbCrtErr);
    }
  }
);

通过 Athena 创建 Glue 数据目录 SDK

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-02-16 01:03:09

通过 Athena 创建 Glue 数据目录 SDK

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-02-16 01:03:09

解决方案1
1 已采纳 2021-02-16 01:03:09