简体   繁体   中英

Create Glue data catalog via Athena SDK

I would like to use Athena to run queries on data in an S3 bucket in another AWS account. I am using Javascript SDK. Reading through the documentation , I understand that I must first create a data catalog that will point Athena to the correct S3 location.

I think that I have to call the createDataCatalog method. Most of the arguments for this method are self-explanatory, except for the "parameters" argument, which seems to contain information about how the data catalog will be created. But I am unable to find anywhere how these parameters should look.

So my questions are:

  1. What are the parameters to provide to here?
  2. Is this the right way to create a glue data catalog (including database and table)?
  3. Once done, will this allow me to run Athena queries on the data catalog?

For a simple use case with static S3 data,

  • We first need to create Glue Table using Glue createTable API pointing to S3 location. Few Examples in cli documentation .
  • Run queries against this Glue Table from Athena

Here is an example to create Glue Database and Table

const AWS = require("aws-sdk");
AWS.config.update({ region: "us-east-1" });

const glue = new AWS.Glue();
const dbName = "test-db";
glue.createDatabase(
  {
    DatabaseInput: {
      Name: dbName,
    },
  },
  function (dbCrtErr, dbRsp) {
    if (dbCrtErr.message === "Database already exists." || dbRsp) {
      console.log("dbRsp", dbRsp);
      glue.createTable(
        {
          DatabaseName: dbName,
          TableInput: {
            Name: "my-table",
            Parameters: {
              classification: "json",
              compressionType: "none",
            },
            TableType: "EXTERNAL_TABLE",
            StorageDescriptor: {
              Location: "s3://my-s3-bucket-with-events/",
              InputFormat:
                "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
              OutputFormat:
                "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
              Columns: [
                {
                  Name: "id",
                  Type: "string",
                },
                {
                  Name: "name",
                  Type: "string",
                },
              ],
            },
          },
        },
        function (error, response) {
          console.log("error", error, "response", response);
        }
      );
    } else {
      console.log("dbCrtErr", dbCrtErr);
    }
  }
);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM