简体   繁体   English

将 Delta Lake 包添加到 AWS EMR Notebook

[英]Add Delta Lake packages to AWS EMR Notebook

Delta jar delta-core_2.11-0.6.1.jar is added to EMR Master node "SPARK_HOME/jars" directory. Delta jar delta-core_2.11-0.6.1.jar添加到 EMR 主节点“SPARK_HOME/jars”目录。 However calling Delta API from EMR Notebook I am getting following error:但是从 EMR Notebook 调用 Delta API 我收到以下错误:

# Though Notebook comes with default SPARK instant so following line I didn't execute 
# spark = SparkSession.builder.appName("MyApp") \
#    .config("spark.jars.packages", "io.delta:delta-core_2.11:0.6.1") \
#    .getOrCreate()

from delta.tables import * # ModuleNotFoundError: No module named 'delta'

CLI command pyspark --packages "io.delta:delta-core_2.11:0.6.1" is working fine in Master node. CLI 命令pyspark --packages "io.delta:delta-core_2.11:0.6.1"在主节点中运行良好。 I am able to access Delta APIs in CLI mode.我能够在 CLI 模式下访问 Delta API。

Is there any way I can use Delta APIs directly in Notebook.有什么方法可以直接在 Notebook 中使用 Delta API。 Please suggest.请建议。

The tables.py file containing the DeltaTable class can be found in the delta repo on github.可以在 github 上的增量存储库中找到包含 DeltaTable class 的 tables.py 文件。 You can find it here - https://github.com/delta-io/delta/tree/master/python/delta你可以在这里找到它 - https://github.com/delta-io/delta/tree/master/python/delta

You can either clone the repo (Remember to select the correct branch) or copy the file and upload that to Jupyter.您可以克隆存储库(请记住 select 正确的分支)或复制文件并将其上传到 Jupyter。 Either way it'll need adding as a dependency, so you'll need something like无论哪种方式,它都需要添加为依赖项,所以你需要类似的东西

import sys
sys.path.append('mnt/jupyterhome/<username>/<folder_containing_tables.py>)

Hopefully that'll get you up and running!希望这会让你启动并运行!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM