简体   繁体   中英

Python SQL query execution time

I have little experience working with Python and SQL. I've been learning by myself in order to get my master thesis done.

I just wrote a small script to benchmark around 50 identically structured databases, as follow:

import thesis,pyodbc

# SQL Server settings
drvr = '{SQL Server Native Client 10.0}'
host = 'host_directory'
user = 'username'
pswd = 'password'
table = 'tBufferAux' # Found (by inspection) to be the table containing relevant data
column = 'Data'

# Establish a connection to SQL Server
cnxn = pyodbc.connect(driver=drvr, server=host, uid=user, pwd=pswd) # Setup connection

endRow = 'SELECT TOP 1 ' + column + ' FROM [' # Query template for ending row
with open(thesis.db_metadata_path(),'w') as file:
    for db in thesis.db_list():
        # Prepare queries
        countRows_query = 'SELECT COUNT(*) FROM [' + db + '].dbo.' + table
        firstRow_query = endRow + db + '].dbo.' + table + ' ORDER BY ' + column + ' ASC'
        lastRow_query = endRow + db + '].dbo.' + table + ' ORDER BY ' + column + ' DESC'
        # Execute queries
        N_rows = cnxn.cursor().execute(countRows_query).fetchone()[0]
        first_row = cnxn.cursor().execute(firstRow_query).fetchone()
        last_row = cnxn.cursor().execute(lastRow_query).fetchone()
        # Save output to text file
        file.write(db + ' ' + str(N_rows) + ' ' + str(first_row.Data) + ' ' + str(last_row.Data) + '\n')

# Close session
cnxn.cursor().close()
cnxn.close()

I was surprised to find this simple program to take almost 10 seconds to run, so I was wondering if that is just normal or do I have any part of my code that may be slowering down the execution. (I remind you that the for loop runs only 56 times)

Note that any function from thesis (customized) module has very little influence, since all of them are just variable assignments (except for thesis.db_list() which is a quick .txt file reading)

EDIT: This is the output .txt file generated by this program. The second column is the number of records of that table for each database.

  • timeit is good to measure and compare the performance of single statements and code chunks (note that in iPython , there's a built-in command to do that more easily).

  • Profilers split the measurement down to every function called (so are more useful for larger amounts of code).

  • Note that a standalone program (more so, one in an interpreted language) has a startup (and shutdown) overhead.

Combined, 10 secs don't look like very much for a program that accesses a database.

As a test, I wrapped your program in a profiler like this:

def main():
<your program>
if __name__=='__main__':
    import cProfile
    cProfile.run('main()')

And ran it from cygwin 's bash like this:

T1=`date +%T,%N`; /c/Python27/python.exe ./t.py; echo $T1; date +%T,%N

The resulting table listed connect as the single time hog (my machine is a very fast i7 3.9GHz/8GB with a local MSSQL and SSD as the system disk):

     7200 function calls (7012 primitive calls) in 0.058 seconds

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
<...>
     1    0.003    0.003    0.058    0.058 t.py:1(main)
<...>
     1    0.043    0.043    0.043    0.043 {pyodbc.connect}
<...>

And the date commands showed the program itself ran for around 300ms, giving it 250ms for total overhead:

<...>:39,782700900
<...>:40,072717400

(By excluding python from the command line, I confirmed that other commands' overhead is negligible - about 7us)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM