简体   繁体   English

如何使用bsddb3将(长)整数值写入Berkeley DB?

[英]How to write (long) integer values to Berkeley DB using bsddb3?

I am trying to use Berkeley DB to store a frequency table (ie hashtable with string keys and integer values). 我正在尝试使用Berkeley DB来存储频率表(即具有字符串键和整数值的哈希表)。 The table will be written, updated, and read from Python; 该表将通过Python编写,更新和读取; so I am currently experimenting with bsddb3. 所以我目前正在尝试bsddb3。 This looks like it will do most of what I want, except it looks like it only supports string values? 看起来它将满足我的大部分要求,但看起来仅支持字符串值?

If I understand correctly, Berkeley DB supports any kind of binary key and value. 如果我理解正确,那么Berkeley DB支持任何类型的二进制键和值。 Is there a way to efficiently pass raw long integers in/out of Berkeley DB using bsddb3? 有没有一种方法可以使用bsddb3有效地将原始长整数传入/传出Berkeley DB? I know I can convert the values to/from strings, and this is probably what I will end up doing, but is there a more efficient way? 我知道我可以将值转换为字符串,也可以转换为字符串,这可能是我最终要做的事情,但是有没有更有效的方法? Ie by storing 'raw' integers? 即通过存储“原始”整数?


Background: I am currently working with a large (potentially tens, if not hundreds, of millions of keys) frequency table. 背景:目前,我正在使用一个大型(可能是数十个(如果不是数百个)数百万个密钥)频率表。 This is currently implemented using a Python dictionary, but I abort the script when it starts to swap into virtual memory. 目前,这是使用Python字典实现的,但是当脚本开始交换到虚拟内存时,我中止了该脚本。 Yes I looked at Redis, but this stores the entire database in memory. 是的,我查看了Redis,但这会将整个数据库存储在内存中。 So I'm about to try Berkeley DB. 因此,我将尝试Berkeley DB。 I should be able to improve the creation efficiency by using short-term in-memory caching. 通过使用短期内存中缓存,我应该能够提高创建效率。 Ie create an in-memory Python dictionary, and then periodically add this to the master Berkeley DB frequency table. 即创建一个内存中的Python字典,然后定期将其添加到主Berkeley DB频率表中。

Do you need to read the data back from a language other than python? 您是否需要从python以外的其他语言读取数据? If not, you can just use pickle on the python long integers, and unpickle them when you read them back in. You might be able to (probably be able to) use the shelve module, which would do this automatically for you. 如果没有,您可以在python长整数上使用pickle,然后在读回它们时解开它们。您可能(可能)使用shelve模块,它将自动为您执行此操作。 But even if not, you can manually pickle and unpickle the values. 但是,即使没有,您也可以手动腌制和腌制这些值。

>>> import cPickle as pickle
>>> pickle.dumps(19999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999, pickle.HIGHEST_PROTOCOL)
'\x80\x02\x8a(\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\x7fT\x97\x05p\x0b\x18J#\x9aA\xa5.{8=O,f\xfa\x81|\xa1\xef\xaa\xfd\xa2e\x02.'
>>> pickle.loads('\x80\x02\x8a(\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\x7fT\x97\x05p\x0b\x18J#\x9aA\xa5.{8=O,f\xfa\x81|\xa1\xef\xaa\xfd\xa2e\x02.')
19999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999L

Python struct to convert an integer to bytes in Python 3 or string in Python 2. Depending on your data you might use different packing format for unsigned long long or uint64_t : Python struct可将整数转换为Python 3中的字节或Python 2中的字符串。根据数据,您可能对unsigned long longuint64_t使用不同的打包格式:

struct.unpack('>Q', my_integer)

This will return the byte representation of my_integer on bigendian which match the lexicographical order required by bsddb key values. 这将返回my_integermy_integer的字节表示形式,该字节表示形式与my_integer 键值所需的字典顺序匹配。 You can come with smarter packing function (have a look at wiredtiger.intpacking ) to save a space. 您可以使用更智能的打包功能(请查看wiredtiger.intpacking )以节省空间。

You don't need a Python cache, use DBEnv.set_cache_max and set_cache . 您不需要Python缓存,请使用DBEnv.set_cache_maxset_cache

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM