简体   繁体   English

无法在 Lambda 中创建 AWS 胶水爬虫,Lambda 由步骤 function 触发

[英]Cannot create AWS glue crawler in a Lambda, Lambda is triggered by a Step function

This is my scenario:这是我的场景:

  1. API gateway/Lambda triggers a Step function. The payload to the step function is a SQL query to run on an existing Athena table API 网关/Lambda 触发步骤 function。步骤 function 的有效负载是一个 SQL 查询以在现有 Athena 表上运行

  2. Task-1 of Step function calls StartQueryExecution API on Athena.步骤 function 的任务 1 在 Athena 上调用 StartQueryExecution API。 The query runs successfully and generates results in a given S bucket查询成功运行并在给定的 S 桶中生成结果

  3. Task-2 invokes a Lambda which creates an AWS crawler based on the results from Task-2 (Task-2 gets the S3 file location as input from Task-1) Task-2 调用 Lambda,它根据 Task-2 的结果创建一个 AWS 爬虫(Task-2 从 Task-1 获取 S3 文件位置作为输入)

  4. Task-3 invokes a Lambda that runs the crawler that is created in Task-2 Task-3 调用 Lambda 运行在 Task-2 中创建的爬虫

在此处输入图像描述

To create AWS crawler in Lambda, following is my code in Lambda (NodeJS):要在 Lambda 中创建 AWS 爬虫,以下是我在 Lambda (NodeJS) 中的代码:

  exports.handler = async(event) => {
    const awsglue = new aws.Glue();

    const uuid = event.QueryExecutionID
    var path = event.OutputPath

    var params = {
        Name: uuid,
        Role: <Role ARN>,
        DatabaseName: <Database name>,
        Targets: {
            S3Targets: [{
                Path: path
            }]
        }
    }

   var request = await awsglue.createCrawler(params, (err, data) => {
   if (err) console.log(err, err.stack);
   else console.log(data);
    })
  
    const response = {
        statusCode: 200,
        body: JSON.stringify(uuid),
    };
    return response;
};

Problem createCrawler being an asynchronous call, Lambda returns SUCCESS, even before the creation of the crawler.问题createCrawler是一个异步调用,Lambda 返回 SUCCESS,甚至在创建爬虫之前。 Consequently, Task-3, which is supposed to run the crawler, fails.因此,应该运行爬虫的 Task-3 失败了。

To work around this problem I tried combining createCrawler and startCrawler in the same Lambda function but that too doesn't work.为了解决这个问题,我尝试在同一个 Lambda function 中组合createCrawlerstartCrawler但这也不起作用。

Am I missing something?我错过了什么吗? Is it not possible to create an AWS Glue crawler in a Lambda function that is triggered by a Step function?难道不能在 Lambda function 中创建由步骤 function 触发的 AWS Glue 爬虫吗?

The Glue crawler should be created with the state machine of Step Functions managed by Infra as code, like CloudFormation, Terraform and AWS CDK. Glue 爬虫应该使用由 Infra 管理的 Step Functions 的 state 机器作为代码创建,例如 CloudFormation、Terraform 和 AWS CDK。

Then the lambda func start the crawler and retrieve the result of crawler's run.然后 lambda 函数启动爬虫并检索爬虫运行的结果。

See the similar sample code,查看类似的示例代码,

Lambda Func to start crawler, https://github.com/awslabs/realtime-fraud-detection-with-gnn-on-dgl/blob/61b341eb1fed1b1b471c9fdccce0c348ff7f343f/src/lambda.d/crawl-data-catalog/index.ts#L24-L37 Lambda 启动爬虫的功能, https://github.com/awslabs/realtime-fraud-detection-with-gnn-on-dgl/blob/61b341eb1fed1b1b471c9fdccce0c348ff7f343f/src/lambda.d/crawl-data-catalog/index.ts L24-L37

Create crawler by CDK, https://github.com/awslabs/realtime-fraud-detection-with-gnn-on-dgl/blob/61b341eb1fed1b1b471c9fdccce0c348ff7f343f/src/lib/etl-glue.ts#L78-L95通过CDK创建爬虫, https://github.com/awslabs/realtime-fraud-detection-with-gnn-on-dgl/blob/61b341eb1fed1b1b471c9fdccce0c348ff7f343f/src/lib/etl-glue.ts#L78-L95

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM