无法在 Lambda 中创建 AWS 胶水爬虫，Lambda 由步骤 function 触发

Question

This is my scenario:这是我的场景：

API gateway/Lambda triggers a Step function. The payload to the step function is a SQL query to run on an existing Athena table API 网关/Lambda 触发步骤 function。步骤 function 的有效负载是一个 SQL 查询以在现有 Athena 表上运行
Task-1 of Step function calls StartQueryExecution API on Athena.步骤 function 的任务 1 在 Athena 上调用 StartQueryExecution API。 The query runs successfully and generates results in a given S bucket查询成功运行并在给定的 S 桶中生成结果
Task-2 invokes a Lambda which creates an AWS crawler based on the results from Task-2 (Task-2 gets the S3 file location as input from Task-1) Task-2 调用 Lambda，它根据 Task-2 的结果创建一个 AWS 爬虫（Task-2 从 Task-1 获取 S3 文件位置作为输入）
Task-3 invokes a Lambda that runs the crawler that is created in Task-2 Task-3 调用 Lambda 运行在 Task-2 中创建的爬虫

To create AWS crawler in Lambda, following is my code in Lambda (NodeJS):要在 Lambda 中创建 AWS 爬虫，以下是我在 Lambda (NodeJS) 中的代码：

  exports.handler = async(event) => {
    const awsglue = new aws.Glue();

    const uuid = event.QueryExecutionID
    var path = event.OutputPath

    var params = {
        Name: uuid,
        Role: <Role ARN>,
        DatabaseName: <Database name>,
        Targets: {
            S3Targets: [{
                Path: path
            }]
        }
    }

   var request = await awsglue.createCrawler(params, (err, data) => {
   if (err) console.log(err, err.stack);
   else console.log(data);
    })
  
    const response = {
        statusCode: 200,
        body: JSON.stringify(uuid),
    };
    return response;
};

Problem createCrawler being an asynchronous call, Lambda returns SUCCESS, even before the creation of the crawler.问题createCrawler是一个异步调用，Lambda 返回 SUCCESS，甚至在创建爬虫之前。 Consequently, Task-3, which is supposed to run the crawler, fails.因此，应该运行爬虫的 Task-3 失败了。

To work around this problem I tried combining createCrawler and startCrawler in the same Lambda function but that too doesn't work.为了解决这个问题，我尝试在同一个 Lambda function 中组合createCrawler和startCrawler但这也不起作用。

Am I missing something?我错过了什么吗？ Is it not possible to create an AWS Glue crawler in a Lambda function that is triggered by a Step function?难道不能在 Lambda function 中创建由步骤 function 触发的 AWS Glue 爬虫吗？

Answer 1

The Glue crawler should be created with the state machine of Step Functions managed by Infra as code, like CloudFormation, Terraform and AWS CDK. Glue 爬虫应该使用由 Infra 管理的 Step Functions 的 state 机器作为代码创建，例如 CloudFormation、Terraform 和 AWS CDK。

Then the lambda func start the crawler and retrieve the result of crawler's run.然后 lambda 函数启动爬虫并检索爬虫运行的结果。

See the similar sample code,查看类似的示例代码，

Lambda Func to start crawler, https://github.com/awslabs/realtime-fraud-detection-with-gnn-on-dgl/blob/61b341eb1fed1b1b471c9fdccce0c348ff7f343f/src/lambda.d/crawl-data-catalog/index.ts#L24-L37 Lambda 启动爬虫的功能， https://github.com/awslabs/realtime-fraud-detection-with-gnn-on-dgl/blob/61b341eb1fed1b1b471c9fdccce0c348ff7f343f/src/lambda.d/crawl-data-catalog/index.ts L24-L37

Create crawler by CDK, https://github.com/awslabs/realtime-fraud-detection-with-gnn-on-dgl/blob/61b341eb1fed1b1b471c9fdccce0c348ff7f343f/src/lib/etl-glue.ts#L78-L95通过CDK创建爬虫， https://github.com/awslabs/realtime-fraud-detection-with-gnn-on-dgl/blob/61b341eb1fed1b1b471c9fdccce0c348ff7f343f/src/lib/etl-glue.ts#L78-L95

无法在 Lambda 中创建 AWS 胶水爬虫，Lambda 由步骤 function 触发

问题描述

1 个解决方案

解决方案1
2 2021-08-15 15:39:04

无法在 Lambda 中创建 AWS 胶水爬虫，Lambda 由步骤 function 触发

问题描述

1 个解决方案

解决方案1 2 2021-08-15 15:39:04

解决方案1
2 2021-08-15 15:39:04