簡體   English   中英

在 AWS Lambda 上運行 Jupyter Notebook

[英]Running Jupyter Notebook on AWS Lambda

我正在嘗試在 AWS Lambda 上運行 Jupyter Notebook,創建了一個包含所有依賴項的層,jupyter notebook 是一個簡單的代碼,它從亞馬遜 S3 中提取 csv 文件並將數據顯示為條形圖。 下面是lambda function為下載.ipynb文件並用papermill執行notebook而寫的。 不知道為什么找不到 boto3 模塊失敗。

import json
import sys
import os
import boto3
# papermill to execute notebook
import papermill as pm
import pandas as pd
import logging
import matplotlib.pyplot as plt

sys.path.append("/opt/bin")
sys.path.append("/opt/python")
os.environ["PYTHONPATH"]='/var/task'
os.environ["PYTHONPATH"]='/opt/python/'
os.environ["MPLCONFIGDIR"] = '/tmp/'
# ipython needs a writeable directory
os.environ["IPYTHONDIR"]='/tmp/ipythondir'
logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    s3 = boto3.resource('s3')
    s3.meta.client.download_file('test-boto', 'testing.ipynb', '/tmp/test.ipynb')
    pm.execute_notebook('/tmp/test.ipynb', '/tmp/juptest_output.ipynb', kernel_name='python3')
    s3_client.upload_file('/tmp/juptest_output.ipynb', 'test-boto', 'temp/juptest_output.ipynb')
    logger.info(event)

錯誤輸出:

START RequestId: c4da3406-c829-4f99-9fbf-b231a0d3dc06 Version: $LATEST
[INFO]  2020-08-07T17:55:16.602Z    c4da3406-c829-4f99-9fbf-b231a0d3dc06    Input Notebook:  /tmp/test.ipynb
[INFO]  2020-08-07T17:55:16.603Z    c4da3406-c829-4f99-9fbf-b231a0d3dc06    Output Notebook: /tmp/juptest_output.ipynb

Executing:   0%|          | 0/15 [00:00<?, ?cell/s][INFO]   2020-08-07T17:55:17.311Z    c4da3406-c829-4f99-9fbf-b231a0d3dc06    Executing notebook with kernel: python3
OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k

Executing:   7%|▋         | 1/15 [00:01<00:14,  1.06s/cell]
Executing:   7%|▋         | 1/15 [00:01<00:20,  1.46s/cell]
[ERROR] PapermillExecutionError: 
---------------------------------------------------------------------------
Exception encountered at "In [1]":
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-9c332490c231> in <module>
      1 import pandas as pd
      2 import os
----> 3 import boto3
      4 import matplotlib.pyplot as plt
      5 client = boto3.client('s3')

ModuleNotFoundError: No module named 'boto3'

Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 28, in lambda_handler
    pm.execute_notebook('/tmp/test.ipynb', '/tmp/juptest_output.ipynb', kernel_name='python3')
  File "/opt/python/papermill/execute.py", line 110, in execute_notebook
    raise_for_execution_errors(nb, output_path)
  File "/opt/python/papermill/execute.py", line 222, in raise_for_execution_errors
    raise errorEND RequestId: c4da3406-c829-4f99-9fbf-b231a0d3dc06
REPORT RequestId:c4da3406-c829-4f99-9fbf-b231a0d3dc06
    Duration: 1624.78 ms    Billed Duration: 1700 ms    Memory Size: 3008 MB    Max Memory Used: 293 MB

Jupyter 筆記本:

import pandas as pd
import os
import boto3
import matplotlib.pyplot as plt
client = boto3.client('s3')

path = 's3://test-boto/aws-costs-Owner-Month-08.csv'
monthly_owner = pd.read_csv(path)
plt.bar(monthly_owner.Owner.head(6),monthly_owner.Amount.head(6))
plt.xlabel('Owner', fontsize=15)
plt.ylabel('Amount', fontsize=15)
plt.title('AWS Monthly Cost by Owner')
plt.show()

看起來造紙廠 kernel 無法檢測到 boto3 package,即使您的 lambda 處理程序能夠找到它。 我看到您在 lambda 處理程序中覆蓋(而不是附加)PYTHONPATH。 這將從 PYTHONPATH 中刪除其他目錄以查找包。 Papermill 子進程隨后將使用這個 python 路徑。

您可能還會發現很有用。 它允許您將 Jupyter Notebooks 直接部署為無服務器功能。 它在幕后使用造紙廠。

免責聲明:我為 Clouderizer 工作。

似乎有一個簡單的方法...... 只需使用 Amazon SageMaker 筆記本實例https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM