[英]peewee.OperationalError: too many SQL variables on upsert of only 150 rows * 8 columns
With the below example, on my machine, setting range(150)
leads to the error, while range(100)
does not: 使用下面的示例,在我的机器上,设置
range(150)
导致错误,而range(100)
不会:
from peewee import *
database = SqliteDatabase(None)
class Base(Model):
class Meta:
database = database
colnames = ["A", "B", "C", "D", "E", "F", "G", "H"]
cols = {x: TextField() for x in colnames}
table = type('mytable', (Base,), cols)
database.init('test.db')
database.create_tables([table])
data = []
for x in range(150):
data.append({x: 1 for x in colnames})
with database.atomic() as txn:
table.insert_many(data).upsert().execute()
Leads to: 导致:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 3213, in execute
cursor = self._execute()
File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 2628, in _execute
return self.database.execute_sql(sql, params, self.require_commit)
File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 3461, in execute_sql
self.commit()
File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 3285, in __exit__
reraise(new_type, new_type(*exc_args), traceback)
File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 3454, in execute_sql
cursor.execute(sql, params or ())
peewee.OperationalError: too many SQL variables
This seems very low to me. 这对我来说似乎很低落。 I am trying to use
peewee
to replace existing pandas
based SQL construction, because pandas
lacks support for a primary key. 我正在尝试使用
peewee
来替换现有的基于pandas
的SQL构造,因为pandas
缺乏对主键的支持。 Only being able to insert ~100 records per loop is very low, and fragile if the number of columns goes up some day. 只能在每个循环中插入~100条记录非常低,如果列数在某一天上升,则非常脆弱。
How can I make this work better? 我怎样才能让这项工作变得更好? Is it possible?
可能吗?
After some investigation, the problem appears to be related with the maximum number of parameters that a sql query can have: SQLITE_MAX_VARIABLE_NUMBER. 经过一些调查后,问题似乎与sql查询可以拥有的最大参数数量有关 :SQLITE_MAX_VARIABLE_NUMBER。
To be able to do big bulk inserts I first estimate SQLITE_MAX_VARIABLE_NUMBER and then use it to create chunks in the list of dictionaries I want to insert. 为了能够进行大量批量插入,我首先估计SQLITE_MAX_VARIABLE_NUMBER,然后使用它在我想要插入的字典列表中创建块。
To estimate the value I use this function inspired by this answer : 为了估计值,我使用此功能的灵感来自这个答案 :
def max_sql_variables():
"""Get the maximum number of arguments allowed in a query by the current
sqlite3 implementation. Based on `this question
`_
Returns
-------
int
inferred SQLITE_MAX_VARIABLE_NUMBER
"""
import sqlite3
db = sqlite3.connect(':memory:')
cur = db.cursor()
cur.execute('CREATE TABLE t (test)')
low, high = 0, 100000
while (high - 1) > low:
guess = (high + low) // 2
query = 'INSERT INTO t VALUES ' + ','.join(['(?)' for _ in
range(guess)])
args = [str(i) for i in range(guess)]
try:
cur.execute(query, args)
except sqlite3.OperationalError as e:
if "too many SQL variables" in str(e):
high = guess
else:
raise
else:
low = guess
cur.close()
db.close()
return low
SQLITE_MAX_VARIABLE_NUMBER = max_sql_variables()
Then I use the above variable to slice the data
然后我使用上面的变量来切片
data
with database.atomic() as txn:
size = (SQLITE_MAX_VARIABLE_NUMBER // len(data[0])) -1
# remove one to avoid issue if peewee adds some variable
for i in range(0, len(data), size):
table.insert_many(data[i:i+size]).upsert().execute()
An update about execution speed of max_sql_variables
. 有关
max_sql_variables
执行速度的max_sql_variables
。
On a 3 years old Intel machine with 4 cores and 4 Gb of RAM, running OpenSUSE tumbleweed, with SQLITE_MAX_VARIABLE_NUMBER set to 999, the function runs in less that 100ms. 在具有4核和4 Gb RAM的3年历史的Intel机器上运行OpenSUSE风滚草,SQLITE_MAX_VARIABLE_NUMBER设置为999,该功能运行时间不到100毫秒。 If I set
high = 1000000
, the execution time becomes of the order of 300ms. 如果我设置
high = 1000000
,则执行时间变为300ms的量级。
On a younger Intel machine with 8 cores and 8Gb of RAM, running Kubuntu, with SQLITE_MAX_VARIABLE_NUMBER set to 250000, the function runs in about 2.6 seconds and returns 99999. If I set high = 1000000
, the execution time becomes of the order of 4.5 seconds. 在具有8核和8Gb RAM的年轻英特尔机器上,运行Kubuntu,SQLITE_MAX_VARIABLE_NUMBER设置为250000,该功能在大约2.6秒内运行并返回99999.如果我设置
high = 1000000
,则执行时间变为4.5秒的量级。
Looking here, https://www.sqlite.org/limits.html#max_column it seems the limit should be 2000: 看这里, https://www.sqlite.org/limits.html#max_column似乎限制应该是2000:
The SQLITE_MAX_COLUMN compile-time parameter is used to set an upper bound on:
SQLITE_MAX_COLUMN编译时参数用于设置上限:
- ... snip ...
......剪...
- The number of values in an INSERT statement
INSERT语句中的值的数量
I guess you're bumping against the limit somehow? 我猜你是否会以某种方式碰撞极限? At any rate, just chunk your input or re-compile SQLite with higher limits.
无论如何,只需输入您的输入或重新编译具有更高限制的SQLite。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.