[英]Python insert numpy array into sqlite3 database
I'm trying to store a numpy array of about 1000 floats in a sqlite3 database but I keep getting the error "InterfaceError: Error binding parameter 1 - probably unsupported type".我正在尝试在 sqlite3 数据库中存储大约 1000 个浮点数的 numpy 数组,但我不断收到错误“InterfaceError:错误绑定参数 1 - 可能不受支持的类型”。
I was under the impression a BLOB data type could be anything but it definitely doesn't work with a numpy array.我的印象是 BLOB 数据类型可以是任何东西,但它绝对不适用于 numpy 数组。 Here's what I tried:
这是我尝试过的:
import sqlite3 as sql
import numpy as np
con = sql.connect('test.bd',isolation_level=None)
cur = con.cursor()
cur.execute("CREATE TABLE foobar (id INTEGER PRIMARY KEY, array BLOB)")
cur.execute("INSERT INTO foobar VALUES (?,?)", (None,np.arange(0,500,0.5)))
con.commit()
Is there another module I can use to get the numpy array into the table?我可以使用另一个模块将 numpy 数组放入表中吗? Or can I convert the numpy array into another form in Python (like a list or string I can split) that sqlite will accept?
或者我可以将numpy数组转换为sqlite可以接受的Python中的另一种形式(比如我可以拆分的列表或字符串)吗? Performance isn't a priority.
性能不是优先事项。 I just want it to work!
我只是想让它工作!
Thanks!谢谢!
You could register a new array
data type with sqlite3
:您可以使用
sqlite3
注册新的array
数据类型:
import sqlite3
import numpy as np
import io
def adapt_array(arr):
"""
http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
"""
out = io.BytesIO()
np.save(out, arr)
out.seek(0)
return sqlite3.Binary(out.read())
def convert_array(text):
out = io.BytesIO(text)
out.seek(0)
return np.load(out)
# Converts np.array to TEXT when inserting
sqlite3.register_adapter(np.ndarray, adapt_array)
# Converts TEXT to np.array when selecting
sqlite3.register_converter("array", convert_array)
x = np.arange(12).reshape(2,6)
con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES)
cur = con.cursor()
cur.execute("create table test (arr array)")
With this setup, you can simply insert the NumPy array with no change in syntax:使用此设置,您可以简单地插入 NumPy 数组,而无需更改语法:
cur.execute("insert into test (arr) values (?)", (x, ))
And retrieve the array directly from sqlite as a NumPy array:并直接从 sqlite 检索数组作为 NumPy 数组:
cur.execute("select arr from test")
data = cur.fetchone()[0]
print(data)
# [[ 0 1 2 3 4 5]
# [ 6 7 8 9 10 11]]
print(type(data))
# <type 'numpy.ndarray'>
I think that matlab
format is a really convenient way to store and retrieve numpy arrays.我认为
matlab
格式是一种非常方便的存储和检索 numpy 数组的方法。 Is really fast and the disk and memory footprint is quite the same.速度非常快,磁盘和内存占用量非常相似。
(image from mverleg benchmarks ) (来自mverleg 基准的图像)
But if for any reason you need to store the numpy arrays into SQLite I suggest to add some compression capabilities.但是如果出于任何原因需要将 numpy 数组存储到 SQLite 中,我建议添加一些压缩功能。
The extra lines from unutbu code is pretty simple unutbu代码中的额外行非常简单
compressor = 'zlib' # zlib, bz2
def adapt_array(arr):
"""
http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
"""
# zlib uses similar disk size that Matlab v5 .mat files
# bz2 compress 4 times zlib, but storing process is 20 times slower.
out = io.BytesIO()
np.save(out, arr)
out.seek(0)
return sqlite3.Binary(out.read().encode(compressor)) # zlib, bz2
def convert_array(text):
out = io.BytesIO(text)
out.seek(0)
out = io.BytesIO(out.read().decode(compressor))
return np.load(out)
The results testing with MNIST database gives were:使用 MNIST 数据库测试的结果是:
$ ./test_MNIST.py
[69900]: 99% remain: 0 secs
Storing 70000 images in 379.9 secs
Retrieve 6990 images in 9.5 secs
$ ls -lh example.db
-rw-r--r-- 1 agp agp 69M sep 22 07:27 example.db
$ ls -lh mnist-original.mat
-rw-r--r-- 1 agp agp 53M sep 20 17:59 mnist-original.mat
```
using zlib
, and使用
zlib
和
$ ./test_MNIST.py
[69900]: 99% remain: 12 secs
Storing 70000 images in 8536.2 secs
Retrieve 6990 images in 37.4 secs
$ ls -lh example.db
-rw-r--r-- 1 agp agp 19M sep 22 03:33 example.db
$ ls -lh mnist-original.mat
-rw-r--r-- 1 agp agp 53M sep 20 17:59 mnist-original.mat
using bz2
使用
bz2
Comparing Matlab V5
format with bz2
on SQLite, the bz2 compression is around 2.8, but the access time is quite long comparing to Matlab format (almost instantaneously vs more than 30 secs).在 SQLite 上将
Matlab V5
格式与bz2
进行比较,bz2 压缩率约为 2.8,但与 Matlab 格式相比,访问时间相当长(几乎是瞬间 vs 超过 30 秒)。 Maybe is worthy only for really huge databases where the learning process is much time consuming than access time or where the database footprint is needed to be as small as possible. Maybe 仅适用于真正庞大的数据库,其中学习过程比访问时间更耗时,或者需要尽可能小的数据库占用空间。
Finally note that bipz/zlib
ratio is around 3.7 and zlib/matlab
requires 30% more space.最后请注意,
bipz/zlib
比率约为 3.7,而zlib/matlab
需要多 30% 的空间。
The full code if you want to play yourself is:如果您想自己玩,完整的代码是:
import sqlite3
import numpy as np
import io
compressor = 'zlib' # zlib, bz2
def adapt_array(arr):
"""
http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
"""
# zlib uses similar disk size that Matlab v5 .mat files
# bz2 compress 4 times zlib, but storing process is 20 times slower.
out = io.BytesIO()
np.save(out, arr)
out.seek(0)
return sqlite3.Binary(out.read().encode(compressor)) # zlib, bz2
def convert_array(text):
out = io.BytesIO(text)
out.seek(0)
out = io.BytesIO(out.read().decode(compressor))
return np.load(out)
sqlite3.register_adapter(np.ndarray, adapt_array)
sqlite3.register_converter("array", convert_array)
dbname = 'example.db'
def test_save_sqlite_arrays():
"Load MNIST database (70000 samples) and store in a compressed SQLite db"
os.path.exists(dbname) and os.unlink(dbname)
con = sqlite3.connect(dbname, detect_types=sqlite3.PARSE_DECLTYPES)
cur = con.cursor()
cur.execute("create table test (idx integer primary key, X array, y integer );")
mnist = fetch_mldata('MNIST original')
X, y = mnist.data, mnist.target
m = X.shape[0]
t0 = time.time()
for i, x in enumerate(X):
cur.execute("insert into test (idx, X, y) values (?,?,?)",
(i, y, int(y[i])))
if not i % 100 and i > 0:
elapsed = time.time() - t0
remain = float(m - i) / i * elapsed
print "\r[%5d]: %3d%% remain: %d secs" % (i, 100 * i / m, remain),
sys.stdout.flush()
con.commit()
con.close()
elapsed = time.time() - t0
print
print "Storing %d images in %0.1f secs" % (m, elapsed)
def test_load_sqlite_arrays():
"Query MNIST SQLite database and load some samples"
con = sqlite3.connect(dbname, detect_types=sqlite3.PARSE_DECLTYPES)
cur = con.cursor()
# select all images labeled as '2'
t0 = time.time()
cur.execute('select idx, X, y from test where y = 2')
data = cur.fetchall()
elapsed = time.time() - t0
print "Retrieve %d images in %0.1f secs" % (len(data), elapsed)
if __name__ == '__main__':
test_save_sqlite_arrays()
test_load_sqlite_arrays()
This works for me:这对我有用:
import sqlite3 as sql
import numpy as np
import json
con = sql.connect('test.db',isolation_level=None)
cur = con.cursor()
cur.execute("DROP TABLE FOOBAR")
cur.execute("CREATE TABLE foobar (id INTEGER PRIMARY KEY, array BLOB)")
cur.execute("INSERT INTO foobar VALUES (?,?)", (None, json.dumps(np.arange(0,500,0.5).tolist())))
con.commit()
cur.execute("SELECT * FROM FOOBAR")
data = cur.fetchall()
print data
data = cur.fetchall()
my_list = json.loads(data[0][1])
Happy Leap Second has it close but I kept getting an automatic casting to string. Happy Leap Second 已经很接近了,但我一直在自动转换为字符串。 Also if you check out this other post: a fun debate on using buffer or Binary to push non text data into sqlite you see that the documented approach is to avoid the buffer all together and use this chunk of code.
此外,如果您查看其他帖子: 关于使用缓冲区或二进制将非文本数据推送到 sqlite 的有趣辩论,您会看到记录的方法是一起避免缓冲区并使用这段代码。
def adapt_array(arr):
out = io.BytesIO()
np.save(out, arr)
out.seek(0)
return sqlite3.Binary(out.read())
I haven't heavily tested this in python 3, but it seems to work in python 2.7我没有在 python 3 中对此进行大量测试,但它似乎在 python 2.7 中工作
The other methods specified didn't work for me.指定的其他方法对我不起作用。 And well there seems to be a
numpy.tobytes
method now and a numpy.fromstring
(which works on byte strings) but is deprecated and the recommended method is numpy.frombuffer
.现在似乎有一个
numpy.tobytes
方法和一个numpy.fromstring
(适用于字节字符串),但已被弃用,推荐的方法是numpy.frombuffer
。
import sqlite3
import numpy as np
sqlite3.register_adapter(np.array, lambda arr: arr.tobytes())
sqlite3.register_converter("array", np.frombuffer)
I've tested it in my application and it works well for me on Python 3.7.3
and numpy 1.16.2
我已经在我的应用程序中对其进行了测试,它在
Python 3.7.3
和numpy 1.16.2
上运行良好
numpy.fromstring
gives the same outputs along with DeprecationWarning: The binary mode of fromstring is deprecated, as it behaves surprisingly on unicode inputs. Use frombuffer instead
numpy.fromstring
提供与DeprecationWarning: The binary mode of fromstring is deprecated, as it behaves surprisingly on unicode inputs. Use frombuffer instead
DeprecationWarning: The binary mode of fromstring is deprecated, as it behaves surprisingly on unicode inputs. Use frombuffer instead
Ready to use code based on @unutbu's answer (cleaned a bit, no need to seek, etc.), and test with a 2D ndarray
:准备使用基于@unutbu 答案的代码(稍微清理一下,无需搜索等),并使用 2D
ndarray
进行测试:
import sqlite3, numpy as np, io
def adapt_array(arr):
out = io.BytesIO()
np.save(out, arr)
return sqlite3.Binary(out.getvalue())
sqlite3.register_adapter(np.ndarray, adapt_array)
sqlite3.register_converter("array", lambda x: np.load(io.BytesIO(x)))
x = np.random.rand(100, 100)
con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES)
con.execute("create table test (arr array)")
con.execute("insert into test (arr) values (?)", (x, ))
for r in con.execute("select arr from test"):
print(r[0])
You can use this (see @gavin's answer) instead if and only if you only work with 1D arrays:当且仅当您仅使用一维数组时,您可以使用它(请参阅@gavin 的答案):
sqlite3.register_adapter(np.ndarray, lambda arr: arr.tobytes())
sqlite3.register_converter("array", np.frombuffer)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.