简体   繁体   English

如何在 AWS Lambda 中运行 DBT?

[英]How to run DBT in AWS Lambda?

I have currently dockerized my DBT solution and I launch it in AWS Fargate (triggered from Airflow).我目前已经对我的 DBT 解决方案进行了 docker 化,并在 AWS Fargate 中启动它(从 Airflow 触发)。 However, Fargate requires about 1 minute to start running (image pull + resource provisioning + etc.), which is great for long running executions (hours), but not for short ones (1-5 minutes).但是,Fargate 需要大约 1 分钟才能开始运行(镜像拉取 + 资源配置 + 等),这对于长时间运行的执行(数小时)来说非常有用,但对于短时间执行(1-5 分钟)则不然。

I'm trying to run my docker container in AWS Lambda instead of in AWS Fargate for short executions, but I encountered several problems during this migration.我试图在 AWS Lambda 而不是 AWS Fargate 中运行我的 docker 容器以进行短期执行,但我在这次迁移过程中遇到了几个问题。

The one I cannot fix is related to the bellow message, at the time of running the dbt deps --profiles-dir. && dbt run -t my_target --profiles-dir. --select my_model我无法修复的问题与运行dbt deps --profiles-dir. && dbt run -t my_target --profiles-dir. --select my_model dbt deps --profiles-dir. && dbt run -t my_target --profiles-dir. --select my_model

Running with dbt=0.21.0
Encountered an error:
[Errno 38] Function not implemented

It says there is no function implemented but I cannot see anywhere which is that function. As it appears at the time of installing dbt packages (redshift and dbt_utils), I tried to download them and include them in the docker image (set local paths in packages.yml ), but nothing changed.它说没有实现 function,但我看不到任何地方是 function。正如在安装 dbt 包(redshift 和 dbt_utils)时出现的那样,我尝试下载它们并将它们包含在 docker 图像中(设置本地路径在packages.yml ),但没有任何改变。 Moreover, DBT writes no logs at this phase (I set the log-path to /tmp in the dbt_project.yml so that it can have write permissions within the Lambda), so I'm blind.此外,DBT 在此阶段不写入任何日志(我在dbt_project.yml中将日志路径设置为/tmp ,以便它可以在 Lambda 中具有写入权限),所以我是盲人。

Digging into this problem, I've found that this can be related to multiprocessing issues within AWS Lamba (my docker image contains python scripts), as stated in https://github.com/dbt-labs/dbt-core/issues/2992 .深入研究这个问题,我发现这可能与 AWS Lamba 中的多处理问题有关(我的 docker 图像包含 python 个脚本),如https://github.com/dbt-labs/dbt-core/issues/中所述2992 I run DBT from python using the subprocess library.我使用subprocess进程库从 python 运行 DBT。

Since it may be a multiprocessing issue, I have also tried to set "threads": 1 in profiles.yml but it did not solve the problem.由于它可能是一个多处理问题,我也尝试在profiles.yml中设置"threads": 1但它并没有解决问题。

Does anyone succeeded in deploying DBT in AWS Lambda?有没有人在 AWS Lambda 中成功部署 DBT?

I've recently been trying to do this, and the summary of what I've found is that it seems to be possible, but isn't worth it.我最近一直在尝试这样做,我发现的总结是这似乎是可能的,但不值得。

You can pretty easily build a Lambda Layer that includes dbt & the provider you want to use, but you'll also need to patch the multiprocessing behavior and invoke dbt.main from within the Lambda code.您可以很容易地构建一个 Lambda 层,其中包括 dbt 和您要使用的提供程序,但您还需要修补多处理行为并从 Lambda 代码中调用 dbt.main。 Once you've jumped through all those hops, you're left with a dbt instance that is limited to a relatively small upper bound on memory, a 15 minute maximum runtime, and is throttled to a single thread.跳过所有这些跃点后,您将得到一个 dbt 实例,该实例被限制在 memory 上相对较小的上限,最长运行时间为 15 分钟,并且被限制为单个线程。

This discussion gives an rough example of what's needed to get it running in Lambda: https://github.com/dbt-labs/dbt-core/issues/2992#issuecomment-919288906这个讨论给出了一个粗略的例子,说明在 Lambda 中运行它需要什么: https://github.com/dbt-labs/dbt-core/issues/2992#issuecomment-919288906

All that said, I'd love to put dbt on a Lambda and I hope dbt's multiprocessing will one day support it.综上所述,我很乐意将 dbt 放在 Lambda 上,我希望 dbt 的多处理有一天会支持它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM