简体   繁体   中英

Improving database query speed with Python

Edit - I am using Windows 10

Is there a faster alternative to pd._read_sql_query for a MS SQL database?

I was using pandas to read the data and add some columns and calculations on the data. I have cut out most of the alterations now and I am basically just reading (1-2 million rows per day at a time; my query is to read all of the data from the previous date) the data and saving it to a local database (Postgres).

The server I am connecting to is across the world and I have no privileges at all other than to query for the data. I want the solution to remain in Python if possible. I'd like to speed it up though and remove any overhead. Also, you can see that I am writing a file to disk temporarily and then opening it to COPY FROM STDIN. Is there a way to skip the file creation? It is sometimes over 500mb which seems like a waste.

engine = create_engine(engine_name)
query = 'SELECT * FROM {} WHERE row_date = %s;'
df = pd.read_sql_query(query.format(table_name), engine, params={query_date})
df.to_csv('../raw/temp_table.csv', index=False)
df= open('../raw/temp_table.csv')
process_file(conn=pg_engine, table_name=table_name, file_object=df)

UPDATE:

you can also try to unload data using bcp utility , which might be lot faster compared to pd.read_sql() , but you will need a local installation of Microsoft Command Line Utilities for SQL Server

After that you can use PostgreSQL's COPY ... FROM ... ...

OLD answer:

you can try to write your DF directly to PostgreSQL (skipping the df.to_csv(...) and df= open('../raw/temp_table.csv') parts):

from sqlalchemy import create_engine

engine = create_engine(engine_name)
query = 'SELECT * FROM {} WHERE row_date = %s;'
df = pd.read_sql_query(query.format(table_name), engine, params={query_date})

pg_engine = create_engine('postgresql+psycopg2://user:password@host:port/dbname')
df.to_sql(table_name, pg_engine, if_exists='append')

Just test whether it's faster compared to COPY FROM STDIN ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM