简体   繁体   English

每次重启时自动在 Databricks 集群上安装 pyodbc

[英]Automatically install pyodbc on a Databricks cluster upon each restart

I have been using pyodbc on one of my Databricks clusters and have been installing it using this shell command running in the first cell of my notebook:我一直在我的一个 Databricks 集群上使用 pyodbc,并且一直使用在我的笔记本的第一个单元中运行的这个 shell 命令来安装它:

curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list 
apt-get update
ACCEPT_EULA=Y apt-get install msodbcsql17
apt-get -y install unixodbc-dev
sudo apt-get install python3-pip -y
pip3 install --upgrade pyodbc

This works fine but I have to execute it each time I run the cluster and intend to use pyodbc.这工作正常,但每次运行集群并打算使用 pyodbc 时我都必须执行它。 I have been doing this by including this piece of code as the first cell of each notebook that uses pyodbc.我一直通过将这段代码作为每个使用 pyodbc 的笔记本的第一个单元格来做到这一点。 To fix this I tried to save this code as a .sh file, uploaded it to dbfs, and then added it as one of my cluster's init files.为了解决这个问题,我尝试将此代码保存为 .sh 文件,将其上传到 dbfs,然后将其添加为我的集群的 init 文件之一。 Upon running the code given below:运行下面给出的代码后:

cnxn1 = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+jdbcHostname+';DATABASE='+jdbcDatabase+';UID='+username1+';PWD='+ password1)

I get the following error:我收到以下错误:

('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC Driver 17 for SQL Server' : file not found (0) (SQLDriverConnect)")

What is it that I am doing wrong with my shell commands/init script that's causing this issue.我的 shell 命令/初始化脚本有什么问题导致了这个问题。 Any help would be greatly appreciated.任何帮助将不胜感激。 Thanks!谢谢!

This is the recommended way of doing it.这是推荐的方法。

Create the file like this :像这样创建文件:

dbutils.fs.put("dbfs:/databricks/scripts/pyodbc-install.sh","""
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list 
apt-get update
ACCEPT_EULA=Y apt-get install msodbcsql17
apt-get -y install unixodbc-dev
sudo apt-get install python3-pip -y
pip3 install --upgrade pyodbc""", True)

Then go to your cluster configuration page.然后转到您的集群配置页面。 在此处输入图片说明

Click on Edit:点击编辑:

在此处输入图片说明

Go down and expand Advanced Options > Init Scripts向下展开高级选项 > 初始化脚本

There you can add the path of the script :在那里你可以添加脚本的路径: 在此处输入图片说明

Then you can click on Confirm.然后你可以点击确认。

Now, this script will be executed at the start of your cluster and will make pyodbc available on all notebooks attached to it.现在,此脚本将在您的集群启动时执行,并使 pyodbc 在连接到它的所有笔记本上可用。

Is it how you did it ?你是怎么做到的?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM