简体   繁体   English

使用Python提高数据库查询速度

[英]Improving database query speed with Python

Edit - I am using Windows 10 编辑 - 我正在使用Windows 10

Is there a faster alternative to pd._read_sql_query for a MS SQL database? 对于MS SQL数据库,是否有更快的替代pd._read_sql_query?

I was using pandas to read the data and add some columns and calculations on the data. 我正在使用pandas来读取数据并在数据上添加一些列和计算。 I have cut out most of the alterations now and I am basically just reading (1-2 million rows per day at a time; my query is to read all of the data from the previous date) the data and saving it to a local database (Postgres). 我现在已经删除了大部分的改动,我基本上只是阅读(每天1-2万行,我的查询是读取前一天的所有数据)数据并将其保存到本地数据库(Postgres的)。

The server I am connecting to is across the world and I have no privileges at all other than to query for the data. 我连接的服务器遍布全球,除了查询数据外,我没有任何权限。 I want the solution to remain in Python if possible. 如果可能的话,我希望解决方案保留在Python中。 I'd like to speed it up though and remove any overhead. 我想加快速度并消除任何开销。 Also, you can see that I am writing a file to disk temporarily and then opening it to COPY FROM STDIN. 此外,您可以看到我暂时将文件写入磁盘,然后将其打开到COPY FROM STDIN。 Is there a way to skip the file creation? 有没有办法跳过文件创建? It is sometimes over 500mb which seems like a waste. 它有时超过500mb,这似乎是浪费。

engine = create_engine(engine_name)
query = 'SELECT * FROM {} WHERE row_date = %s;'
df = pd.read_sql_query(query.format(table_name), engine, params={query_date})
df.to_csv('../raw/temp_table.csv', index=False)
df= open('../raw/temp_table.csv')
process_file(conn=pg_engine, table_name=table_name, file_object=df)

UPDATE: 更新:

you can also try to unload data using bcp utility , which might be lot faster compared to pd.read_sql() , but you will need a local installation of Microsoft Command Line Utilities for SQL Server 您还可以尝试使用bcp实用程序卸载数据,与pd.read_sql()相比,这可能要快得多,但您需要Microsoft Command Line Utilities for SQL Server安装Microsoft Command Line Utilities for SQL Server

After that you can use PostgreSQL's COPY ... FROM ... ... 之后你可以使用PostgreSQL的COPY ... FROM ... ......

OLD answer: 老答案:

you can try to write your DF directly to PostgreSQL (skipping the df.to_csv(...) and df= open('../raw/temp_table.csv') parts): 您可以尝试将DF直接写入PostgreSQL(跳过df.to_csv(...)df= open('../raw/temp_table.csv')部分):

from sqlalchemy import create_engine

engine = create_engine(engine_name)
query = 'SELECT * FROM {} WHERE row_date = %s;'
df = pd.read_sql_query(query.format(table_name), engine, params={query_date})

pg_engine = create_engine('postgresql+psycopg2://user:password@host:port/dbname')
df.to_sql(table_name, pg_engine, if_exists='append')

Just test whether it's faster compared to COPY FROM STDIN ... 只是测试它是否比COPY FROM STDIN更快......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM