繁体   English   中英

在 AWS Lambda 上运行 Jupyter Notebook

[英]Running Jupyter Notebook on AWS Lambda

我正在尝试在 AWS Lambda 上运行 Jupyter Notebook,创建了一个包含所有依赖项的层,jupyter notebook 是一个简单的代码,它从亚马逊 S3 中提取 csv 文件并将数据显示为条形图。 下面是lambda function为下载.ipynb文件并用papermill执行notebook而写的。 不知道为什么找不到 boto3 模块失败。

import json
import sys
import os
import boto3
# papermill to execute notebook
import papermill as pm
import pandas as pd
import logging
import matplotlib.pyplot as plt

sys.path.append("/opt/bin")
sys.path.append("/opt/python")
os.environ["PYTHONPATH"]='/var/task'
os.environ["PYTHONPATH"]='/opt/python/'
os.environ["MPLCONFIGDIR"] = '/tmp/'
# ipython needs a writeable directory
os.environ["IPYTHONDIR"]='/tmp/ipythondir'
logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    s3 = boto3.resource('s3')
    s3.meta.client.download_file('test-boto', 'testing.ipynb', '/tmp/test.ipynb')
    pm.execute_notebook('/tmp/test.ipynb', '/tmp/juptest_output.ipynb', kernel_name='python3')
    s3_client.upload_file('/tmp/juptest_output.ipynb', 'test-boto', 'temp/juptest_output.ipynb')
    logger.info(event)

错误输出:

START RequestId: c4da3406-c829-4f99-9fbf-b231a0d3dc06 Version: $LATEST
[INFO]  2020-08-07T17:55:16.602Z    c4da3406-c829-4f99-9fbf-b231a0d3dc06    Input Notebook:  /tmp/test.ipynb
[INFO]  2020-08-07T17:55:16.603Z    c4da3406-c829-4f99-9fbf-b231a0d3dc06    Output Notebook: /tmp/juptest_output.ipynb

Executing:   0%|          | 0/15 [00:00<?, ?cell/s][INFO]   2020-08-07T17:55:17.311Z    c4da3406-c829-4f99-9fbf-b231a0d3dc06    Executing notebook with kernel: python3
OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k

Executing:   7%|▋         | 1/15 [00:01<00:14,  1.06s/cell]
Executing:   7%|▋         | 1/15 [00:01<00:20,  1.46s/cell]
[ERROR] PapermillExecutionError: 
---------------------------------------------------------------------------
Exception encountered at "In [1]":
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-9c332490c231> in <module>
      1 import pandas as pd
      2 import os
----> 3 import boto3
      4 import matplotlib.pyplot as plt
      5 client = boto3.client('s3')

ModuleNotFoundError: No module named 'boto3'

Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 28, in lambda_handler
    pm.execute_notebook('/tmp/test.ipynb', '/tmp/juptest_output.ipynb', kernel_name='python3')
  File "/opt/python/papermill/execute.py", line 110, in execute_notebook
    raise_for_execution_errors(nb, output_path)
  File "/opt/python/papermill/execute.py", line 222, in raise_for_execution_errors
    raise errorEND RequestId: c4da3406-c829-4f99-9fbf-b231a0d3dc06
REPORT RequestId:c4da3406-c829-4f99-9fbf-b231a0d3dc06
    Duration: 1624.78 ms    Billed Duration: 1700 ms    Memory Size: 3008 MB    Max Memory Used: 293 MB

Jupyter 笔记本:

import pandas as pd
import os
import boto3
import matplotlib.pyplot as plt
client = boto3.client('s3')

path = 's3://test-boto/aws-costs-Owner-Month-08.csv'
monthly_owner = pd.read_csv(path)
plt.bar(monthly_owner.Owner.head(6),monthly_owner.Amount.head(6))
plt.xlabel('Owner', fontsize=15)
plt.ylabel('Amount', fontsize=15)
plt.title('AWS Monthly Cost by Owner')
plt.show()

看起来造纸厂 kernel 无法检测到 boto3 package,即使您的 lambda 处理程序能够找到它。 我看到您在 lambda 处理程序中覆盖(而不是附加)PYTHONPATH。 这将从 PYTHONPATH 中删除其他目录以查找包。 Papermill 子进程随后将使用这个 python 路径。

您可能还会发现很有用。 它允许您将 Jupyter Notebooks 直接部署为无服务器功能。 它在幕后使用造纸厂。

免责声明:我为 Clouderizer 工作。

似乎有一个简单的方法...... 只需使用 Amazon SageMaker 笔记本实例https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM