简体   繁体   English

在 mongodb 中保存 numpy 数组

[英]Saving numpy array in mongodb

I have a couple of MongoDB documents wherein one my the fields is best represented as a matrix (numpy array).我有几个 MongoDB 文件,其中一个我的字段最好表示为矩阵(numpy 数组)。 I would like to save this document to MongoDB, how do I do this?我想把这份文件保存到MongoDB,我该怎么做?

{
'name' : 'subject1',
'image_name' : 'blah/foo.png',
'feature1' : np.array(...)
}

For a 1D numpy array, you can use lists:对于一维 numpy 数组,您可以使用列表:

# serialize 1D array x
record['feature1'] = x.tolist()

# deserialize 1D array x
x = np.fromiter( record['feature1'] )

For multidimensional data, I believe you'll need to use pickle and pymongo.binary.Binary:对于多维数据,我相信你需要使用 pickle 和 pymongo.binary.Binary:

# serialize 2D array y
record['feature2'] = pymongo.binary.Binary( pickle.dumps( y, protocol=2) ) )

# deserialize 2D array y
y = pickle.loads( record['feature2'] )

The code pymongo.binary.Binary(...) didnt work for me, may be we need to use bson as @tcaswell suggested.代码 pymongo.binary.Binary(...) 对我不起作用,可能我们需要按照@tcaswell 的建议使用 bson。

Anyway here is one solution for multi-dimensional numpy array无论如何,这是多维 numpy 数组的一种解决方案

>>from bson.binary import Binary
>>import pickle
# convert numpy array to Binary, store record in mongodb
>>record['feature2'] = Binary(pickle.dumps(npArray, protocol=2), subtype=128 )
# get record from mongodb, convert Binary to numpy array
>> npArray = pickle.loads(record['feature2'])

Having said that, the credit goes to MongoWrapper used the code written by them.话虽如此,归功于MongoWrapper使用了他们编写的代码。

We've built an open source library for storing numeric data (Pandas, numpy, etc.) in MongoDB:我们在 MongoDB 中构建了一个用于存储数字数据(Pandas、numpy 等)的开源库:

https://github.com/manahl/arctic https://github.com/manahl/arctic

Best of all it's really easy to use, pretty fast and supports data versioning, multiple data libraries and more.最重要的是,它真的很容易使用,速度非常快,并且支持数据版本控制、多个数据库等等。

I know this is an old question but here is an elegant solution which works in new versions of pymongo:我知道这是一个老问题,但这里有一个优雅的解决方案,适用于新版本的 pymongo:

import pickle
from bson.binary import Binary, USER_DEFINED_SUBTYPE
from bson.codec_options import TypeCodec, TypeRegistry, CodecOptions
import numpy as np

class NumpyCodec(TypeCodec):
    python_type = np.ndarray
    bson_type = Binary

    def transform_python(self, value):
        return Binary(pickle.loads(value), USER_DEFINED_SUBTYPE)

    def transform_bson(self, value):
        if value.subtype == USER_DEFINED_SUBTYPE:
            return pickle.dumps(value, protocol=2)
        return value

def get_codec_options():
    numpy_codec = NumpyCodec()
    type_registry = TypeRegistry([numpy_codec])
    codec_options = CodecOptions(type_registry=type_registry)
    return codec_options

def get_collection(name, db):
    codec_options = get_codec_options()
    return db.get_collection(name, codec_options=codec_options)

Then you can get you collection this way:然后你可以通过这种方式获得你的收藏:

from pymongo import MongoClient
client = MongoClient()
db = client['my_db']
my_collection = get_collection('my_collection', db)

Afterwards, you just insert and find with Numpy arrays in your database transparently.之后,您只需透明地在数据库中插入并查找 Numpy arrays 即可。

Have you tried Monary?你试过Monary吗?

They have examples on the site他们在网站上有例子

http://djcinnovations.com/index.php/archives/103 http://djcinnovations.com/index.php/archives/103

Have you try MongoWrapper , i think it simple:您是否尝试过 MongoWrapper ,我认为这很简单:

Declare connection to mongodb server and collection to save your np.声明连接到 mongodb 服务器和集合以保存你的 np.

import monogowrapper as mdb
db = mdb.MongoWrapper(dbName='test',
                      collectionName='test_collection', 
                      hostname="localhost", 
                      port="27017") 
my_dict = {"name": "Important experiment", 
            "data":np.random.random((100,100))}

The dictionary's just as you'd expect it to be:这本词典正如您所期望的那样:

print my_dict
{'data': array([[ 0.773217,  0.517796,  0.209353, ...,  0.042116,  0.845194,
         0.733732],
       [ 0.281073,  0.182046,  0.453265, ...,  0.873993,  0.361292,
         0.551493],
       [ 0.678787,  0.650591,  0.370826, ...,  0.494303,  0.39029 ,
         0.521739],
       ..., 
       [ 0.854548,  0.075026,  0.498936, ...,  0.043457,  0.282203,
         0.359131],
       [ 0.099201,  0.211464,  0.739155, ...,  0.796278,  0.645168,
         0.975352],
       [ 0.94907 ,  0.363454,  0.912208, ...,  0.480943,  0.810243,
         0.217947]]),
 'name': 'Important experiment'}

Save data to mongo:将数据保存到mongo:

db.save(my_dict)

To load back data:加载回数据:

my_loaded_dict = db.load({"name":"Important experiment"})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM