简体   繁体   中英

How to create a data catalog in Amazon Glue externally?

I want to create a data catalog externally in Amazon Glue. Is there any way?

AWS Glue Data Catalog consists of meta information about various data sources within AWS, eg S3, DynamoDB etc. Instead of using Crawlers or AWS Console, you can populate data catalog directly with AWS Glue API related to different structures, like Database, Table etc. AWS provides several SDKs for different languages, eg boto3 for python with easy to use object-oriented API. So as long as you know how your data structure, you can use methods

Create Database definition:

from pprint import pprint
import boto3

client = boto3.client('glue')
response = client.create_database(
    DatabaseInput={
        'Name': 'my_database',  # Required
        'Description': 'Database created with boto3 API',
        'Parameters': {
            'my_param_1': 'my_param_value_1'
        },
    }
)
pprint(response)

# Output
{
    'ResponseMetadata': {
        'HTTPHeaders': {
            'connection': 'keep-alive',
            'content-length': '2',
            'content-type': 'application/x-amz-json-1.1',
            'date': 'Fri, 11 Oct 2019 12:37:12 GMT',
            'x-amzn-requestid': '12345-67890'
        },
        'HTTPStatusCode': 200,
        'RequestId': '12345-67890',
        'RetryAttempts': 0
    }
}

在此处输入图像描述

Create Table definition:

response = client.create_table(
    DatabaseName='my_database',
    TableInput={
        'Name': 'my_table',
        'Description': 'Table created with boto3 API',
        'StorageDescriptor': {
            'Columns': [
                {
                    'Name': 'my_column_1',
                    'Type': 'string',
                    'Comment': 'This is very useful column',
                },
                {
                    'Name': 'my_column_2',
                    'Type': 'string',
                    'Comment': 'This is not as useful',
                },
            ],
            'Location': 's3://some/location/on/s3',
        },
        'Parameters': {
            'classification': 'json',
            'typeOfData': 'file',
        }
    }
)

pprint(response)

# Output
{
    'ResponseMetadata': {
        'HTTPHeaders': {
            'connection': 'keep-alive',
            'content-length': '2',
            'content-type': 'application/x-amz-json-1.1',
            'date': 'Fri, 11 Oct 2019 12:38:57 GMT',
            'x-amzn-requestid': '67890-12345'
        },
        'HTTPStatusCode': 200,
        'RequestId': '67890-12345',
        'RetryAttempts': 0
    }
}

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM