This CloudFormation template works as expected and creates all the resources required by this article:
Data visualization and anomaly detection using Amazon Athena and Pandas from Amazon SageMaker | AWS Machine Learning Blog
But the WorkflowStartTrigger
resource does not actually run the crawler. How do I run a crawler using the CloudFormation template?
Resources:
MyRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: "Allow"
Principal:
Service:
- "glue.amazonaws.com"
Action:
- "sts:AssumeRole"
Path: "/"
Policies:
-
PolicyName: "root"
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: "Allow"
Action: "*"
Resource: "*"
MyDatabase:
Type: AWS::Glue::Database
Properties:
CatalogId: !Ref AWS::AccountId
DatabaseInput:
Name: "dbcrawler123"
Description: "TestDatabaseDescription"
LocationUri: "TestLocationUri"
Parameters:
key1 : "value1"
key2 : "value2"
MyCrawler2:
Type: AWS::Glue::Crawler
Properties:
Description: example classifier
Name: "testcrawler123"
Role: !GetAtt MyRole.Arn
DatabaseName: !Ref MyDatabase
Targets:
S3Targets:
- Path: 's3://nytaxi162/'
SchemaChangePolicy:
UpdateBehavior: "UPDATE_IN_DATABASE"
DeleteBehavior: "LOG"
TablePrefix: test-
Configuration: "{\"Version\":1.0,\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBehavior\":\"InheritFromTable\"},\"Tables\":{\"AddOrUpdateBehavior\":\"MergeNewColumns\"}}}"
WorkflowStartTrigger:
Type: AWS::Glue::Trigger
Properties:
Description: Trigger for starting the Crawler
Name: StartTrigger
Type: ON_DEMAND
Actions:
- CrawlerName: "testcrawler123"
You should be able to do that by creating a custom resource attached to a lambda whereby the lambda actually does the action of starting the crawler. You should be able to even make it wait for the crawler to complete its execution
CloudFormation directly doesn't run crawlers, it just create them. But you can create a schedule in order to run a crawler while defining trigger:
ScheduledJobTrigger:
Type: 'AWS::Glue::Trigger'
Properties:
Type: SCHEDULED
StartOnCreation: true
Description: DESCRIPTION_SCHEDULED
Schedule: cron(5 * * * ? *)
Actions:
- CrawlerName: "testcrawler123"
Name: ETLGlueTrigger
If needed to run crawler as part of CloudFormation stack creation, Lambda could be used.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.