简体   繁体   中英

Adding Linux ODBC drivers for SQL Server to a Kaggle/Python docker image

I am trying to build out a data science machine capable of running machine learning python programs in a production environment.

The current business case data needs to be pulled from SQL Server, scored with Machine Learning using python and pushed back to SQL Server.

I want to install the ODBC drivers for Linux.

https://docs.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server

The problem I have is which drivers to install?

When I connect to the container using the following command, it works fine and maps over my python programs in the poc directory.

"docker run -it --volume=c:\\docker\\poc:/poc kaggle/python /bin/bash"

When I try to figure out the version of Linux using the following command,

"uname -a"

I see it is Moby Linux from Docker?

Linux 69982a00af21 4.9.49-moby #1 SMP Wed Sep 27 00:36:29 UTC 2017 x86_64 GNU/Linux

However, what base image is that built from? (Ubuntu, Debian, Fedora, Red Hat, etc)

I need a cracker jack Unix Admin to help me out. How do I get ODBC drivers installed?

Any takers!

Thanks in advance for your help.

John

I'm not sure I have exactly the answer you're looking before because you make reference to the Microsoft ODBC driver- however I will share what I've done to get UnixODBC installed (which I then use with the PyODBC module to talk to an MSSQL database).

Here is a summary of the commands I've used (caveat: untested in this format as they're cut and paste from a production Dockerfile that I can't share in it's entirety-):

Assuming you are inside a Debian/Ubuntu-based Linux Docker container

$ apt-get update
$ apt-get install python2.7 python-pip unixodbc unixodbc-dev freetds-bin freetds-dev tdsodbc nano

# nano is a text editor; Ctrl + O to write out, Ctrl + X to exit

$ nano /etc/odbcinst.ini

[FreeTDS]
Description = FreeTDS Driver for MSSQL
Driver = /usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so
Setup = /usr/lib/x86_64-linux-gnu/odbc/libtdsS.so

$ nano /etc/freetds/freetds.conf

[global]
#   $Id: freetds.conf,v 1.12 2007-12-25 06:02:36 jklowden Exp $
#
# This file is installed by FreeTDS if no file by the same
# name is found in the installation directory.
#
# For information about the layout of this file and its settings,
# see the freetds.conf manpage "man freetds.conf".

# Global settings are overridden by those in a database
# server specific section
[global]
# TDS protocol version
tds version = 8.0

# Whether to write a TDSDUMP file for diagnostic purposes
# (setting this to /tmp is insecure on a multi-user system)
;dump file = /tmp/freetds.log
;debug flags = 0xffff

# Command and connection timeouts
;timeout = 10
;connect timeout = 10

# If you get out-of-memory errors, it may mean that your client
# is trying to allocate a huge buffer for a TEXT field.
# Try setting 'text size' to a more reasonable limit
text size = 64512

# If you experience TLS handshake errors and are using openssl,
# try adjusting the cipher list (don't surround in double or single quotes)
# openssl ciphers = HIGH:!SSLv2:!aNULL:-DH

$ pip install pyodbc

$ nano test_database.py

import pyodbc

host = 'some.host.org'
port = 1433
username = 'some_username'
password = 'some_password'

conn_str = """
            DRIVER={FreeTDS};
            TDS_VERSION=8.0;
            SERVER=%s;
            Port=%i;
            DATABASE=%s;
            UID=%s;
            PWD=%s;
        """ % (
    host, port, database, username, password
)

conn_str = conn_str.replace('\n', '').replace(' ', '')

connection = pyodbc.connect(conn_str, timeout=self._timeout)

connection.autocommit = True

cursor = connection.cursor()

cursor.execute('SELECT * FROM some_table;')

rows = cursor.fetchall()

for row in rows:
    print row

cursor.close()

connection.close()

(the code block above is scrollable)

I must caveat again this is all cut and paste from our codebase and I haven't tested it as it appears right now.

If you don't have access to an MSSQL database for testing, you can spin one up in as a Docker container (SQL Server Express Edition) with the following command:

docker run -d --name mssql-server -p 1433:1433 -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=some_password' -e 'MSSQL_PID=Express' microsoft/mssql-server-linux:latest

Also please note as part of the packages installed above you'll have the "tsql" command line tool which can be used to double check your database stuff outside of Python.

Best of luck!

Ran into the same problem and assembled a Docker image based on this template:

FROM python:3.7.7-slim-stretch  

# Install pyodbc https://github.com/mkleehammer/pyodbc/wiki/Install
RUN apt-get update && apt-get install -y --no-install-recommends g++ unixodbc-dev curl gnupg apt-transport-https apt-utils
RUN pip install pyodbc

# Install Microsoft ODBC 17 
# https://docs.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server?view=sql-server-ver15
ENV ACCEPT_EULA=Y
RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
RUN curl https://packages.microsoft.com/config/debian/9/prod.list > /etc/apt/sources.list.d/mssql-release.list
RUN apt-get update && apt-get install -y --no-install-recommends msodbcsql17 mssql-tools

The best sources to assemble this solution were pyodbc wiki and MS Instructions for odic 17 that led me to use FROM python:3.7.7-slim-stretch so it would be both updated versions but that were documented and with clear documentation for this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM