Automatically install pyodbc on a Databricks cluster upon each restart

Question

I have been using pyodbc on one of my Databricks clusters and have been installing it using this shell command running in the first cell of my notebook:

curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list 
apt-get update
ACCEPT_EULA=Y apt-get install msodbcsql17
apt-get -y install unixodbc-dev
sudo apt-get install python3-pip -y
pip3 install --upgrade pyodbc

This works fine but I have to execute it each time I run the cluster and intend to use pyodbc. I have been doing this by including this piece of code as the first cell of each notebook that uses pyodbc. To fix this I tried to save this code as a .sh file, uploaded it to dbfs, and then added it as one of my cluster's init files. Upon running the code given below:

cnxn1 = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+jdbcHostname+';DATABASE='+jdbcDatabase+';UID='+username1+';PWD='+ password1)

I get the following error:

('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC Driver 17 for SQL Server' : file not found (0) (SQLDriverConnect)")

What is it that I am doing wrong with my shell commands/init script that's causing this issue. Any help would be greatly appreciated. Thanks!

Answer 1

This is the recommended way of doing it.

Create the file like this :

dbutils.fs.put("dbfs:/databricks/scripts/pyodbc-install.sh","""
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list 
apt-get update
ACCEPT_EULA=Y apt-get install msodbcsql17
apt-get -y install unixodbc-dev
sudo apt-get install python3-pip -y
pip3 install --upgrade pyodbc""", True)

Then go to your cluster configuration page.

Click on Edit:

Go down and expand Advanced Options > Init Scripts

There you can add the path of the script :

Then you can click on Confirm.

Now, this script will be executed at the start of your cluster and will make pyodbc available on all notebooks attached to it.

Is it how you did it ?

Automatically install pyodbc on a Databricks cluster upon each restart

Question

1 answers

solution1
0 2021-07-23 13:00:24

Automatically install pyodbc on a Databricks cluster upon each restart

Question

1 answers

solution1 0 2021-07-23 13:00:24

solution1
0 2021-07-23 13:00:24