简体   繁体   中英

Run a crawler using CloudFormation template

This CloudFormation template works as expected and creates all the resources required by this article:

Data visualization and anomaly detection using Amazon Athena and Pandas from Amazon SageMaker | AWS Machine Learning Blog

But the WorkflowStartTrigger resource does not actually run the crawler. How do I run a crawler using the CloudFormation template?

Resources:
  MyRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          -
            Effect: "Allow"
            Principal:
              Service:
                - "glue.amazonaws.com"
            Action:
              - "sts:AssumeRole"
      Path: "/"
      Policies:
        -
          PolicyName: "root"
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              -
                Effect: "Allow"
                Action: "*"
                Resource: "*"
 
  MyDatabase:
    Type: AWS::Glue::Database
    Properties:
      CatalogId: !Ref AWS::AccountId
      DatabaseInput:
        Name: "dbcrawler123"
        Description: "TestDatabaseDescription"
        LocationUri: "TestLocationUri"
        Parameters:
          key1 : "value1"
          key2 : "value2"
 
  MyCrawler2:
    Type: AWS::Glue::Crawler
    Properties:
      Description: example classifier
      Name: "testcrawler123"
      Role: !GetAtt MyRole.Arn
      DatabaseName: !Ref MyDatabase
      Targets:
        S3Targets:
          - Path: 's3://nytaxi162/'
      SchemaChangePolicy:
        UpdateBehavior: "UPDATE_IN_DATABASE"
        DeleteBehavior: "LOG"
      TablePrefix: test-
      Configuration: "{\"Version\":1.0,\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBehavior\":\"InheritFromTable\"},\"Tables\":{\"AddOrUpdateBehavior\":\"MergeNewColumns\"}}}"


  WorkflowStartTrigger:
    Type: AWS::Glue::Trigger
    Properties:
      Description: Trigger for starting the Crawler
      Name: StartTrigger
      Type: ON_DEMAND
      Actions:
        - CrawlerName: "testcrawler123"

You should be able to do that by creating a custom resource attached to a lambda whereby the lambda actually does the action of starting the crawler. You should be able to even make it wait for the crawler to complete its execution

CloudFormation directly doesn't run crawlers, it just create them. But you can create a schedule in order to run a crawler while defining trigger:

ScheduledJobTrigger:
  Type: 'AWS::Glue::Trigger'
  Properties:
    Type: SCHEDULED
    StartOnCreation: true
    Description: DESCRIPTION_SCHEDULED
    Schedule: cron(5 * * * ? *)
    Actions:
      - CrawlerName: "testcrawler123"
    Name: ETLGlueTrigger

If needed to run crawler as part of CloudFormation stack creation, Lambda could be used.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM