简体   繁体   中英

How to run DBT in AWS Lambda?

I have currently dockerized my DBT solution and I launch it in AWS Fargate (triggered from Airflow). However, Fargate requires about 1 minute to start running (image pull + resource provisioning + etc.), which is great for long running executions (hours), but not for short ones (1-5 minutes).

I'm trying to run my docker container in AWS Lambda instead of in AWS Fargate for short executions, but I encountered several problems during this migration.

The one I cannot fix is related to the bellow message, at the time of running the dbt deps --profiles-dir. && dbt run -t my_target --profiles-dir. --select my_model dbt deps --profiles-dir. && dbt run -t my_target --profiles-dir. --select my_model

Running with dbt=0.21.0
Encountered an error:
[Errno 38] Function not implemented

It says there is no function implemented but I cannot see anywhere which is that function. As it appears at the time of installing dbt packages (redshift and dbt_utils), I tried to download them and include them in the docker image (set local paths in packages.yml ), but nothing changed. Moreover, DBT writes no logs at this phase (I set the log-path to /tmp in the dbt_project.yml so that it can have write permissions within the Lambda), so I'm blind.

Digging into this problem, I've found that this can be related to multiprocessing issues within AWS Lamba (my docker image contains python scripts), as stated in https://github.com/dbt-labs/dbt-core/issues/2992 . I run DBT from python using the subprocess library.

Since it may be a multiprocessing issue, I have also tried to set "threads": 1 in profiles.yml but it did not solve the problem.

Does anyone succeeded in deploying DBT in AWS Lambda?

I've recently been trying to do this, and the summary of what I've found is that it seems to be possible, but isn't worth it.

You can pretty easily build a Lambda Layer that includes dbt & the provider you want to use, but you'll also need to patch the multiprocessing behavior and invoke dbt.main from within the Lambda code. Once you've jumped through all those hops, you're left with a dbt instance that is limited to a relatively small upper bound on memory, a 15 minute maximum runtime, and is throttled to a single thread.

This discussion gives an rough example of what's needed to get it running in Lambda: https://github.com/dbt-labs/dbt-core/issues/2992#issuecomment-919288906

All that said, I'd love to put dbt on a Lambda and I hope dbt's multiprocessing will one day support it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM