Pandas to_sql Parameters & Performance

Question

I'm currently trying to tune the performance of a few of my scripts a little bit and it seems that the bottleneck is always the actual insert into the DB (=MSSQL) with the pandas to_sql function.

One factor which plays into this is mssql's parameter limit of 2100.

I establish my connection with sqlalchemy (with the mssql + pyodbc flavour):

engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params, fast_executemany=True)

When inserting I use a chunksize (so I stay below the parameter limit and method="multi"):

dataframe_audit.to_sql(name="Audit", con=connection, if_exists='append', method="multi",
                               chunksize=50, index=False)

This leads to the following (unfortunately extremely inconsistent) performance:

I'm not sure what to think of this exactly:

Inconstistency seems to stem from the DB Server itself
A greater chunksize seems not to translate to a better performance (seems to be the other way round!?)
Maybe I should switch from pyodbc to turbodbc (according to some posts it yields better performance)

Any ideas to get a better insert performance for my DataFrames?

Answer 1

If you are using the most recent version of pyodbc with ODBC Driver 17 for SQL Server and fast_executemany=True in your SQLAlchemy create_engine call then you should be using method=None (the default) in your to_sql call. That will allow pyodbc to use an ODBC parameter array and give you the best performance under that setup. You will not hit the SQL Server stored procedure limit of 2100 parameters (unless your DataFrame has ~2100 columns). The only limit you would face would be if your Python process does not have sufficient memory available to build the entire parameter array before sending it to the SQL Server.

The method='multi' option for to_sql is only applicable to pyodbc when using an ODBC driver that does not support parameter arrays (eg, FreeTDS ODBC). In that case fast_executemany=True will not help and may actually cause errors.

Pandas to_sql Parameters & Performance

Question

1 answers

solution1
2 ACCPTED 2019-12-22 13:13:06

Pandas to_sql Parameters & Performance

Question

1 answers

solution1 2 ACCPTED 2019-12-22 13:13:06

solution1
2 ACCPTED 2019-12-22 13:13:06