Edit - I am using Windows 10
Is there a faster alternative to pd._read_sql_query for a MS SQL database?
I was using pandas to read the data and add some columns and calculations on the data. I have cut out most of the alterations now and I am basically just reading (1-2 million rows per day at a time; my query is to read all of the data from the previous date) the data and saving it to a local database (Postgres).
The server I am connecting to is across the world and I have no privileges at all other than to query for the data. I want the solution to remain in Python if possible. I'd like to speed it up though and remove any overhead. Also, you can see that I am writing a file to disk temporarily and then opening it to COPY FROM STDIN. Is there a way to skip the file creation? It is sometimes over 500mb which seems like a waste.
engine = create_engine(engine_name)
query = 'SELECT * FROM {} WHERE row_date = %s;'
df = pd.read_sql_query(query.format(table_name), engine, params={query_date})
df.to_csv('../raw/temp_table.csv', index=False)
df= open('../raw/temp_table.csv')
process_file(conn=pg_engine, table_name=table_name, file_object=df)
UPDATE:
you can also try to unload data using bcp utility , which might be lot faster compared to pd.read_sql()
, but you will need a local installation of Microsoft Command Line Utilities for SQL Server
After that you can use PostgreSQL's COPY ... FROM ...
...
OLD answer:
you can try to write your DF directly to PostgreSQL (skipping the df.to_csv(...)
and df= open('../raw/temp_table.csv')
parts):
from sqlalchemy import create_engine
engine = create_engine(engine_name)
query = 'SELECT * FROM {} WHERE row_date = %s;'
df = pd.read_sql_query(query.format(table_name), engine, params={query_date})
pg_engine = create_engine('postgresql+psycopg2://user:password@host:port/dbname')
df.to_sql(table_name, pg_engine, if_exists='append')
Just test whether it's faster compared to COPY FROM STDIN
...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.