简体   繁体   English

python并行化db查询执行

[英]python parallelize db query execution

I have following query that returns data between a date ranges. 我有以下查询返回日期范围之间的数据。

dates = ['20100101','20100201',20100301','20100401']

query = 'select date, company_name, total_amount from info_stats where date between 'start_date' and 'end_date'

I obtain date ranges from another process and construct a list so that I can iterate as follows: 我从另一个进程获取日期范围并构造一个列表,以便我可以迭代如下:

pds = []
for idx in range(0, len(dates) - 1):
   formated_query = self.get_formated_query(start_date=dates[idx].strftime('%Y%m%d'),
                                                      end_date=dates[idx + 1].strftime('%Y%m%d')
   results_df = pds.append(pd.read_sql(sql=formated_query,con=db_connect))

to query I am passing date at index and index + 1 (date greater than date at the index) 查询我在indexindex + 1处传递日期(日期大于索引处的日期)

These queries take super long time and i want to execute in a parallelize manner so that wait time is shorter. 这些查询需要超长时间,我想以并行方式执行,以便等待时间更短。 I went over joblib but not sure if this is multi-threading or multi-processing . 我去了joblib但不确定这是multi-threading还是multi-processing Looks like the former. 看起来像前者。 Also new to joblib , how can I parallelize above code using joblib or other package? joblib也是joblib ,如何使用joblib或其他包并行化上面的代码?

The question is quite broad, but I can share my own experience with parallelising queries against databases. 这个问题非常广泛,但我可以分享我自己对数据库并行查询的经验。

What I found is that if I have many small jobs, I can use the python built-in multithreading modules such as concurrent.futures. 我发现如果我有很多小工作,我可以使用python内置多线程模块,如concurrent.futures。 I will get speed ups. 我会加快速度。

However, if I have big jobs that take a long time to run on the database, parallelising does not help. 但是,如果我在数据库上运行需要很长时间的大工作,并行化并没有帮助。 This is because the database engine itself (in my case SQL Server), already does a splendid job of parallelising the job. 这是因为数据库引擎本身(在我的SQL Server中)已经完成了并行工作的出色工作。 In which case, the single big job is already maximising the number of processes the server can handle - putting more jobs won't help. 在这种情况下,单个大工作已经最大化了服务器可以处理的进程数 - 放置更多的工作将无济于事。 Your situation seems to be the this one. 你的情况似乎就是这个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM