简体   繁体   中英

Automatically install pyodbc on a Databricks cluster upon each restart

I have been using pyodbc on one of my Databricks clusters and have been installing it using this shell command running in the first cell of my notebook:

curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list 
apt-get update
ACCEPT_EULA=Y apt-get install msodbcsql17
apt-get -y install unixodbc-dev
sudo apt-get install python3-pip -y
pip3 install --upgrade pyodbc

This works fine but I have to execute it each time I run the cluster and intend to use pyodbc. I have been doing this by including this piece of code as the first cell of each notebook that uses pyodbc. To fix this I tried to save this code as a .sh file, uploaded it to dbfs, and then added it as one of my cluster's init files. Upon running the code given below:

cnxn1 = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+jdbcHostname+';DATABASE='+jdbcDatabase+';UID='+username1+';PWD='+ password1)

I get the following error:

('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC Driver 17 for SQL Server' : file not found (0) (SQLDriverConnect)")

What is it that I am doing wrong with my shell commands/init script that's causing this issue. Any help would be greatly appreciated. Thanks!

This is the recommended way of doing it.

Create the file like this :

dbutils.fs.put("dbfs:/databricks/scripts/pyodbc-install.sh","""
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list 
apt-get update
ACCEPT_EULA=Y apt-get install msodbcsql17
apt-get -y install unixodbc-dev
sudo apt-get install python3-pip -y
pip3 install --upgrade pyodbc""", True)

Then go to your cluster configuration page. 在此处输入图片说明

Click on Edit:

在此处输入图片说明

Go down and expand Advanced Options > Init Scripts

There you can add the path of the script : 在此处输入图片说明

Then you can click on Confirm.

Now, this script will be executed at the start of your cluster and will make pyodbc available on all notebooks attached to it.

Is it how you did it ?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM