[英]python asynchronous read_sql in pandas
I want to speed up the process of getting data from database by splitting the query to 4. I wrote the following code using apply_async. 我想通过将查询拆分为4来加快从数据库获取数据的过程。我使用apply_async编写了以下代码。 However, when using get(), error of pickling appeared.
但是,当使用get()时,会出现酸洗错误。 What should I do?
我该怎么办? Thank you very much.
非常感谢你。
from multiprocessing import Pool
pool = Pool(processes=4)
start_date = datetime.datetime(2017, 1, 1)
end_date = datetime.datetime(2017, 6, 30)
period = (end_date-start_date)/4
conn = pyodbc.connect(
r'DRIVER={SQL Server};'
r'SERVER=abc;'
r'PORT=111;'
r'DATABASE=db;'
r'UID=abc;'
r'PWD=xyz;'
r'TDS_Version=7.1'
)
for p in np.arange(start_date, end_date, period).astype(datetime.datetime):
sql = "SELECT * FROM db where date between \'" + str(p) + "\' and \'" + str(p + period) + "\'"
res.append(pool.apply_async(lambda x: pd.read_sql(x[0], con = x[1]), ([sql, conn],))) # runs in *only* one process
pool.close()
res[0].get()#<-------PicklingError: Can't pickle <function <lambda> at 0x00000045566BDAE8>: attribute lookup <lambda>
You need to move the connection line into each of the subprocess: replace your "lambda x..." by a routine that will connect to the server and then send the request. 您需要将连接线移动到每个子进程中:通过将连接到服务器然后发送请求的例程替换“lambda x ...”。 You cannot open one single connection and share it between the subprocesses
您无法打开单个连接并在子进程之间共享它
Alternatively, you can replace pyodbc by aioodbc: https://github.com/aio-libs/aioodbc This will allow you to implement what you need with asyncio 或者,你可以用aioodbc替换pyodbc: https : //github.com/aio-libs/aioodbc这将允许你用asyncio实现你需要的东西
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.