[英]Python: Using sqlite3 with multiprocessing
I have a SQLite3 DB. 我有一个SQLite3数据库。 I need to parse 10000 files.
我需要解析10000个文件。 I read some data from each file, and then query the DB with this data to get a result.
我从每个文件中读取一些数据,然后使用此数据查询数据库以获得结果。 My code works fine in a single process environment.
我的代码在单个进程环境中工作正常。 But I get an error when trying to use the mulitprocessing Pool.
但是在尝试使用多重处理池时出现错误。
My approach without multiprocessing (works OK):
1. Open DB connection object
2. for f in files:
foo(f, x1=x1, x2=x2, ..., db=DB)
3. Close DB
My approach with multiprocessing (does NOT work):
1. Open DB
2. pool = multiprocessing.Pool(processes=4)
3. pool.map(functools.partial(foo, x1=x1, x2=x2, ..., db=DB), [files])
4. pool.close()
5. Close DB
I get the following error: sqlite3.ProgrammingError: Base Cursor.__init__ not called. 我收到以下错误: sqlite3.ProgrammingError:未调用Base Cursor .__ init__。
My DB class is implemented as follows: 我的DB类实现如下:
def open_db(sqlite_file):
"""Open SQLite database connection.
Args:
sqlite_file -- File path
Return:
Connection
"""
log.info('Open SQLite database %s', sqlite_file)
try:
conn = sqlite3.connect(sqlite_file)
except sqlite3.Error, e:
log.error('Unable to open SQLite database %s', e.args[0])
sys.exit(1)
return conn
def close_db(conn, sqlite_file):
"""Close SQLite database connection.
Args:
conn -- Connection
"""
if conn:
log.info('Close SQLite database %s', sqlite_file)
conn.close()
class MapDB:
def __init__(self, sqlite_file):
"""Initialize.
Args:
sqlite_file -- File path
"""
# 1. Open database.
# 2. Setup to receive data as dict().
# 3. Get cursor to execute queries.
self._sqlite_file = sqlite_file
self._conn = open_db(sqlite_file)
self._conn.row_factory = sqlite3.Row
self._cursor = self._conn.cursor()
def close(self):
"""Close DB connection."""
if self._cursor:
self._cursor.close()
close_db(self._conn, self._sqlite_file)
def check(self):
...
def get_driver_net(self, net):
...
def get_cell_id(self, net):
...
Function foo() looks like this: 函数foo()看起来像这样:
def foo(f, x1, x2, db):
extract some data from file f
r1 = db.get_driver_net(...)
r2 = db.get_cell_id(...)
The overall not working implementation is as follows: 整体不起作用的实施如下:
mapdb = MapDB(sqlite_file)
log.info('Create NetInfo objects')
pool = multiprocessing.Pool(processes=4)
files = [get list of files to process]
pool.map(functools.partial(foo, x1=x1, x2=x2, db=mapdb), files)
pool.close()
mapdb.close()
To fix this, I think I need to create the MapDB() object inside each pool worker (so have 4 parallel/independent connections). 为了解决这个问题,我想我需要在每个池工作者中创建MapDB()对象(因此有4个并行/独立的连接)。 But I'm not sure how to do this.
但我不知道该怎么做。 Can someone show me an example of how to accomplish this with Pool?
有人能告诉我如何用Pool实现这个目标吗?
What about defining foo
like this: 那样定义
foo
怎么样:
def foo(f, x1, x2, db_path):
mapdb = MapDB(db_path)
... open mapdb
... process data ...
... close mapdb
and then change your pool.map call to: 然后将pool.map调用更改为:
pool.map(functools.partial(foo, x1=x1, x2=x2, db_path="path-to-sqlite3-db"), files)
Update 更新
Another option is to handle the worker threads yourself and distribute work via a Queue
. 另一个选择是自己处理工作线程并通过
Queue
分配工作。
from Queue import Queue
from threading import Thread
q = Queue()
def worker():
mapdb = ...open the sqlite database
while True:
item = q.get()
if item[0] == "file":
file = item[1]
... process file ...
q.task_done()
else:
q.task_done()
break
...close sqlite connection...
# Start up the workers
nworkers = 4
for i in range(nworkers):
worker = Thread(target=worker)
worker.daemon = True
worker.start()
# Place work on the Queue
for x in ...list of files...:
q.put(("file",x))
# Place termination tokens onto the Queue
for i in range(nworkers):
q.put(("end",))
# Wait for all work to be done.
q.join()
The termination tokens are used to ensure that the sqlite connections are closed - in case that matters. 终止令牌用于确保关闭sqlite连接 - 如果重要的话。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.