将本地 Jupyter Hub 连接到 Azure Databricks Spark 集群

Question

我有一个 Azure Databricks 集群。 虽然它提供笔记本，但我的团队更熟悉使用 Jupyter Lab，他们可以上传 offline-csv，安装 python 包。 我想设置一个可以连接到 Spark 集群的 Jupyter 实验室。

尽管数据块允许使用远程 kernel 访问它 - https://databricks.com/blog/2019/12/03/jupyterlab-databricks-integration-bridge-local-and-remote-workflows.html ，但它无法读取本地Jupyter 实验室的文件。

有什么方法可以将 spark 集群与本地 jupyter 实验室一起使用，例如https://medium.com/ibm-data-ai/connect-to-remote-kerberized-hive-from-a-local-jupyter-notebook-to-运行 sql 查询 83d5e548d82c ？ 非常感谢

Answer 1

如果你在一个魔法命令前加上一个%% ，它将把单元格的 rest 作为它的参数，这意味着%%local用于从本地实例向 Spark 集群发送数据。

在本地安装 databrickslabs_jupyterlab ：

(base)$ conda create -n dj python=3.8  # you might need to add "pywin32" if you are on Windows
(base)$ conda activate dj
(dj)$   pip install --upgrade databrickslabs-jupyterlab[cli]==2.2.1
(db-jlab)$ dj $PROFILE -k

启动 JupyterLab：

(db-jlab)$ dj $PROFILE -l

测试 Spark 访问：

import socket

from databrickslabs_jupyterlab import is_remote

result = sc.range(10000).repartition(100).map(lambda x: x).sum()
print(socket.gethostname(), is_remote())
print(result)

详情请参考Install Jupyter Notebook on your computer and connect to Apache Spark on HDInsight , Kernels for Jupyter Notebook on Apache Spark clusters in Azure HDInsight和Sending data to Spark cluster from Local instance

将本地 Jupyter Hub 连接到 Azure Databricks Spark 集群

问题描述

1 个解决方案

解决方案1
0

将本地 Jupyter Hub 连接到 Azure Databricks Spark 集群

问题描述

1 个解决方案

解决方案1 0

解决方案1
0