![](/img/trans.png)
[英]Using 'multiprocessing' libary in Python 3 for PostgreSQL queries
[英]Python using multiprocessing on PostgreSQL queries reduce runtime
我在Python中有以下代碼,它將連續執行5個查詢。 每個查詢的平均運行時間約為181.1秒(約3分鍾),所有5個查詢的總運行時間為905.4秒(約15分鍾)。 最終,在將數據加載到DataFrames中之后,我將執行ETL工作(主要是查找錯誤,數據質量問題和不一致),但是在此之前,我想嘗試利用多處理來減少運行時。 我對Python中的多重處理不熟悉,因此我一直在閱讀有關不同方法的信息(隊列與池等)。 我很好奇哪種方法最適合此工作流程,我將如何實現呢? 理想情況下,此代碼的多過程翻譯版本或到達那里的指南會很棒。
謝謝。
編輯:如果我不清楚,我想同時運行所有5個查詢。 可能存在問題的是將每個DataFrame同時添加到列表中,因此,如果需要,我願意放棄。
import pandas as pd
import psycopg2
import time
import os
host = os.environ["DBHOST"]
user = os.environ["DBUSER"]
pass = os.environ["DBPWD"]
db_conn = psycopg2.connect("host='{}' port={} dbname='{}' user={} password={}".format(host,
port#,
"db_name",
user,
pass))
query_load = [("SELECT column_name_1, COUNT(*) "
"FROM schema.table "
"GROUP BY column_name_1 "
"ORDER BY column_name_1 ASC"),
("SELECT column_name_2, COUNT(*) "
"FROM schema.table "
"GROUP BY column_name_2 "
"ORDER BY column_name_2 ASC"),
("SELECT column_name_3, COUNT(*) "
"FROM schema.table "
"GROUP BY column_name_3 "
"ORDER BY column_name_3 ASC"),
("SELECT column_name_4, COUNT(*) "
"FROM schema.table "
"GROUP BY column_name_4 "
"ORDER BY column_name_4 ASC"),
("SELECT column_name_5, COUNT(*) "
"FROM schema.table "
"GROUP BY column_name_5 "
"ORDER BY column_name_5 ASC")]
start_time = time.time()
data_load = []
for queries in query_load:
data_load.append(pd.read_sql(queries, db_conn))
elapsed_time = time.time() - start_time
print ("Job finished in {} seconds".format(elapsed_time))
由於您已經有了一組查詢,因此我們可以組織一個函數一次執行一個查詢,但是通過使用Pool.map
,它們可以同時運行:
from multiprocessing import Pool
import pandas as pd
import time
# define query_load
# define db_conn
def read_sql(query):
return pd.read_sql(query, db_conn)
if __name__ == '__main__':
start_time = time.time()
with Pool(5) as p:
data_load = p.map(read_sql, query_load)
elapsed_time = time.time() - start_time
print ("Job finished in {} seconds".format(elapsed_time))
# carry on re-processing data_load
現在,我假設db_conn
將允許並發請求。
還請注意, p.map
會組織獲取結果並將其加載到list
中。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.