简体   繁体   English

Python将numpy数组插入sqlite3数据库

[英]Python insert numpy array into sqlite3 database

I'm trying to store a numpy array of about 1000 floats in a sqlite3 database but I keep getting the error "InterfaceError: Error binding parameter 1 - probably unsupported type".我正在尝试在 sqlite3 数据库中存储大约 1000 个浮点数的 numpy 数组,但我不断收到错误“InterfaceError:错误绑定参数 1 - 可能不受支持的类型”。

I was under the impression a BLOB data type could be anything but it definitely doesn't work with a numpy array.我的印象是 BLOB 数据类型可以是任何东西,但它绝对不适用于 numpy 数组。 Here's what I tried:这是我尝试过的:

import sqlite3 as sql
import numpy as np
con = sql.connect('test.bd',isolation_level=None)
cur = con.cursor()
cur.execute("CREATE TABLE foobar (id INTEGER PRIMARY KEY, array BLOB)")
cur.execute("INSERT INTO foobar VALUES (?,?)", (None,np.arange(0,500,0.5)))
con.commit()

Is there another module I can use to get the numpy array into the table?我可以使用另一个模块将 numpy 数组放入表中吗? Or can I convert the numpy array into another form in Python (like a list or string I can split) that sqlite will accept?或者我可以将numpy数组转换为sqlite可以接受的Python中的另一种形式(比如我可以拆分的列表或字符串)吗? Performance isn't a priority.性能不是优先事项。 I just want it to work!我只是想让它工作!

Thanks!谢谢!

You could register a new array data type with sqlite3 :您可以使用sqlite3注册新的array数据类型:

import sqlite3
import numpy as np
import io

def adapt_array(arr):
    """
    http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
    """
    out = io.BytesIO()
    np.save(out, arr)
    out.seek(0)
    return sqlite3.Binary(out.read())

def convert_array(text):
    out = io.BytesIO(text)
    out.seek(0)
    return np.load(out)


# Converts np.array to TEXT when inserting
sqlite3.register_adapter(np.ndarray, adapt_array)

# Converts TEXT to np.array when selecting
sqlite3.register_converter("array", convert_array)

x = np.arange(12).reshape(2,6)

con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES)
cur = con.cursor()
cur.execute("create table test (arr array)")

With this setup, you can simply insert the NumPy array with no change in syntax:使用此设置,您可以简单地插入 NumPy 数组,而无需更改语法:

cur.execute("insert into test (arr) values (?)", (x, ))

And retrieve the array directly from sqlite as a NumPy array:并直接从 sqlite 检索数组作为 NumPy 数组:

cur.execute("select arr from test")
data = cur.fetchone()[0]

print(data)
# [[ 0  1  2  3  4  5]
#  [ 6  7  8  9 10 11]]
print(type(data))
# <type 'numpy.ndarray'>

I think that matlab format is a really convenient way to store and retrieve numpy arrays.我认为matlab格式是一种非常方便的存储和检索 numpy 数组的方法。 Is really fast and the disk and memory footprint is quite the same.速度非常磁盘和内存占用量非常相似。

加载/保存/磁盘比较

(image from mverleg benchmarks ) (来自mverleg 基准的图像)

But if for any reason you need to store the numpy arrays into SQLite I suggest to add some compression capabilities.但是如果出于任何原因需要将 numpy 数组存储到 SQLite 中,我建议添加一些压缩功能。

The extra lines from unutbu code is pretty simple unutbu代码中的额外行非常简单

compressor = 'zlib'  # zlib, bz2

def adapt_array(arr):
    """
    http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
    """
    # zlib uses similar disk size that Matlab v5 .mat files
    # bz2 compress 4 times zlib, but storing process is 20 times slower.
    out = io.BytesIO()
    np.save(out, arr)
    out.seek(0)
    return sqlite3.Binary(out.read().encode(compressor))  # zlib, bz2

def convert_array(text):
    out = io.BytesIO(text)
    out.seek(0)
    out = io.BytesIO(out.read().decode(compressor))
    return np.load(out)

The results testing with MNIST database gives were:使用 MNIST 数据库测试的结果是:

$ ./test_MNIST.py
[69900]:  99% remain: 0 secs   
Storing 70000 images in 379.9 secs
Retrieve 6990 images in 9.5 secs
$ ls -lh example.db 
-rw-r--r-- 1 agp agp 69M sep 22 07:27 example.db
$ ls -lh mnist-original.mat 
-rw-r--r-- 1 agp agp 53M sep 20 17:59 mnist-original.mat
```

using zlib , and使用zlib

$ ./test_MNIST.py
[69900]:  99% remain: 12 secs   
Storing 70000 images in 8536.2 secs
Retrieve 6990 images in 37.4 secs
$ ls -lh example.db 
-rw-r--r-- 1 agp agp 19M sep 22 03:33 example.db
$ ls -lh mnist-original.mat 
-rw-r--r-- 1 agp agp 53M sep 20 17:59 mnist-original.mat

using bz2使用bz2

Comparing Matlab V5 format with bz2 on SQLite, the bz2 compression is around 2.8, but the access time is quite long comparing to Matlab format (almost instantaneously vs more than 30 secs).在 SQLite 上将Matlab V5格式与bz2进行比较,bz2 压缩率约为 2.8,但与 Matlab 格式相比,访问时间相当长(几乎是瞬间 vs 超过 30 秒)。 Maybe is worthy only for really huge databases where the learning process is much time consuming than access time or where the database footprint is needed to be as small as possible. Maybe 仅适用于真正庞大的数据库,其中学习过程比访问时间更耗时,或者需要尽可能小的数据库占用空间。

Finally note that bipz/zlib ratio is around 3.7 and zlib/matlab requires 30% more space.最后请注意, bipz/zlib比率约为 3.7,而zlib/matlab需要多 30% 的空间。

The full code if you want to play yourself is:如果您想自己玩,完整的代码是:

import sqlite3
import numpy as np
import io

compressor = 'zlib'  # zlib, bz2

def adapt_array(arr):
    """
    http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
    """
    # zlib uses similar disk size that Matlab v5 .mat files
    # bz2 compress 4 times zlib, but storing process is 20 times slower.
    out = io.BytesIO()
    np.save(out, arr)
    out.seek(0)
    return sqlite3.Binary(out.read().encode(compressor))  # zlib, bz2

def convert_array(text):
    out = io.BytesIO(text)
    out.seek(0)
    out = io.BytesIO(out.read().decode(compressor))
    return np.load(out)

sqlite3.register_adapter(np.ndarray, adapt_array)
sqlite3.register_converter("array", convert_array)

dbname = 'example.db'
def test_save_sqlite_arrays():
    "Load MNIST database (70000 samples) and store in a compressed SQLite db"
    os.path.exists(dbname) and os.unlink(dbname)
    con = sqlite3.connect(dbname, detect_types=sqlite3.PARSE_DECLTYPES)
    cur = con.cursor()
    cur.execute("create table test (idx integer primary key, X array, y integer );")

    mnist = fetch_mldata('MNIST original')

    X, y =  mnist.data, mnist.target
    m = X.shape[0]
    t0 = time.time()
    for i, x in enumerate(X):
        cur.execute("insert into test (idx, X, y) values (?,?,?)",
                    (i, y, int(y[i])))
        if not i % 100 and i > 0:
            elapsed = time.time() - t0
            remain = float(m - i) / i * elapsed
            print "\r[%5d]: %3d%% remain: %d secs" % (i, 100 * i / m, remain),
            sys.stdout.flush()

    con.commit()
    con.close()
    elapsed = time.time() - t0
    print
    print "Storing %d images in %0.1f secs" % (m, elapsed)

def test_load_sqlite_arrays():
    "Query MNIST SQLite database and load some samples"
    con = sqlite3.connect(dbname, detect_types=sqlite3.PARSE_DECLTYPES)
    cur = con.cursor()

    # select all images labeled as '2'
    t0 = time.time()
    cur.execute('select idx, X, y from test where y = 2')
    data = cur.fetchall()
    elapsed = time.time() - t0
    print "Retrieve %d images in %0.1f secs" % (len(data), elapsed)


if __name__ == '__main__':
    test_save_sqlite_arrays()
    test_load_sqlite_arrays()

This works for me:这对我有用:

import sqlite3 as sql
import numpy as np
import json
con = sql.connect('test.db',isolation_level=None)
cur = con.cursor()
cur.execute("DROP TABLE FOOBAR")
cur.execute("CREATE TABLE foobar (id INTEGER PRIMARY KEY, array BLOB)")
cur.execute("INSERT INTO foobar VALUES (?,?)", (None, json.dumps(np.arange(0,500,0.5).tolist())))
con.commit()
cur.execute("SELECT * FROM FOOBAR")
data = cur.fetchall()
print data
data = cur.fetchall()
my_list = json.loads(data[0][1])

Happy Leap Second has it close but I kept getting an automatic casting to string. Happy Leap Second 已经很接近了,但我一直在自动转换为字符串。 Also if you check out this other post: a fun debate on using buffer or Binary to push non text data into sqlite you see that the documented approach is to avoid the buffer all together and use this chunk of code.此外,如果您查看其他帖子: 关于使用缓冲区或二进制将非文本数据推送到 sqlite 的有趣辩论,您会看到记录的方法是一起避免缓冲区并使用这段代码。

def adapt_array(arr):
    out = io.BytesIO()
    np.save(out, arr)
    out.seek(0)
    return sqlite3.Binary(out.read())

I haven't heavily tested this in python 3, but it seems to work in python 2.7我没有在 python 3 中对此进行大量测试,但它似乎在 python 2.7 中工作

The other methods specified didn't work for me.指定的其他方法对我不起作用。 And well there seems to be a numpy.tobytes method now and a numpy.fromstring (which works on byte strings) but is deprecated and the recommended method is numpy.frombuffer .现在似乎有一个numpy.tobytes方法和一个numpy.fromstring (适用于字节字符串),但已被弃用,推荐的方法是numpy.frombuffer

import sqlite3
import numpy as np

sqlite3.register_adapter(np.array, lambda arr: arr.tobytes())    
sqlite3.register_converter("array", np.frombuffer)

I've tested it in my application and it works well for me on Python 3.7.3 and numpy 1.16.2我已经在我的应用程序中对其进行了测试,它在Python 3.7.3numpy 1.16.2上运行良好

numpy.fromstring gives the same outputs along with DeprecationWarning: The binary mode of fromstring is deprecated, as it behaves surprisingly on unicode inputs. Use frombuffer instead numpy.fromstring提供与DeprecationWarning: The binary mode of fromstring is deprecated, as it behaves surprisingly on unicode inputs. Use frombuffer instead DeprecationWarning: The binary mode of fromstring is deprecated, as it behaves surprisingly on unicode inputs. Use frombuffer instead

Ready to use code based on @unutbu's answer (cleaned a bit, no need to seek, etc.), and test with a 2D ndarray :准备使用基于@unutbu 答案的代码(稍微清理一下,无需搜索等),并使用 2D ndarray进行测试:

import sqlite3, numpy as np, io

def adapt_array(arr):
    out = io.BytesIO()
    np.save(out, arr)
    return sqlite3.Binary(out.getvalue())

sqlite3.register_adapter(np.ndarray, adapt_array)
sqlite3.register_converter("array", lambda x: np.load(io.BytesIO(x)))

x = np.random.rand(100, 100)
con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES)
con.execute("create table test (arr array)")
con.execute("insert into test (arr) values (?)", (x, ))
for r in con.execute("select arr from test"):
    print(r[0])

You can use this (see @gavin's answer) instead if and only if you only work with 1D arrays:当且仅当您仅使用一维数组时,您可以使用它(请参阅@gavin 的答案):

sqlite3.register_adapter(np.ndarray, lambda arr: arr.tobytes())
sqlite3.register_converter("array", np.frombuffer)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM