简体繁体 English

键/值对的哪种序列化格式最好在RDBMS中建立索引？

[英]Which serialization format for key/value pairs is best indexable in RDBMS?

原文 2009-07-18 12:14:01 4 2 java/ serialization/ hash/ indexing

I have a certain object type that is stored in a database. 我有某种存储在数据库中的对象类型。 This type now gets additional information associated with it which will differ in structure among instances. 现在，此类型将获得与之关联的其他信息，这些信息在实例之间的结构将有所不同。 Although for groups of instances the information will be identically structured, the structure will only be known at runtime and will change over time. 尽管对于实例组，信息的结构将完全相同，但结构只会在运行时才知道，并且会随时间变化。

I decided to just add a blob field to the table and store the key/value pairs there in some serialized format. 我决定只向表中添加一个blob字段，然后以某种序列化格式将键/值对存储在该表中。 From your experience, what format is most advisable? 根据您的经验，最适合哪种格式？

In the context of my application, the storage space for this is secondary. 在我的应用程序上下文中，此存储空间是次要的。 There's one particular operation that I want to be fast, which is looking up the correct instance for a given set of key / value pairs (so it's a kind of variable-field composite key). 我想快速执行一个特定的操作，该操作针对给定的一组键/值对查找正确的实例（因此，这是一种可变字段复合键）。 I guess that means, is there a format that plays particularly well with typical database indexing? 我想这意味着，有没有一种格式可以与典型的数据库索引一起使用呢？

Additionally, I might be interested in looking for a set of instances that share the same set of keys (an adhoc "class", if you wish). 另外，我可能想寻找一组共享同一组键的实例（如果需要，可以使用即席“类”）。

I'm writing this in Java and I'm storing in various types of SQL databases. 我用Java编写，并存储在各种类型的SQL数据库中。 I've got JSON, GPB and native Java serialization on my radar, favouring the cross-language formats. 我的雷达上有JSON，GPB和本机Java序列化功能，更喜欢跨语言格式。 I can think of two basic strategies: 我可以想到两种基本策略：

store the set of values in the table and add a foreign key to a separate table that contains the set of keys 将值集存储在表中，并将外键添加到包含该键集的单独表中
store the key/value pairs in the table 将键/值对存储在表中

2 个解决方案

Not really an anwser to your question, but did you considered looking at the Java Edition of BerkeleyDB ? 您的问题并不是真的很烦，但是您是否考虑过查看BerkeleyDB的Java版本？ Duplicate keys and serialized values can be stored with this (fast) engine. 可以使用此（快速）引擎存储重复的键和序列化的值。

If your goal is to take advantage of database indexes, storing the unstructured data in a BLOB is not going to be effective. 如果您的目标是利用数据库索引，那么将非结构化数据存储在BLOB中将不会有效。 BLOBs are essentially opaque from the RDBMS's perspective. 从RDBMS的角度来看，BLOB本质上是不透明的。

I gather from your description that the unstructured part of the data takes the form of an arbitrary set of key-value pairs associated with the object, right? 我从您的描述中得知，数据的非结构化部分采取了与对象关联的任意一组键值对的形式，对吗？ Well, if the types of all keys are the same (eg they're all strings), I'd recommend simply creating a child table with (at least) three columns: the key, the value, and a foreign key to the parent object's row in its table. 好吧，如果所有键的类型都相同（例如，它们都是字符串），我建议您简单地创建一个（至少）三列的子表：键，值和父键的外键对象在表中的行。 Since the keys will then be stored in the database as a regular column, they can be indexed effectively. 由于密钥随后将作为常规列存储在数据库中，因此可以对它们进行有效索引。 The index should also include the foreign key to the parent table. 索引还应包括父表的外键。

A completely different approach would be to look at a "schemaless" database engine like CouchDB , which is specifically designed to deal with unstructured data. 完全不同的方法是查看像CouchDB这样的“无模式”数据库引擎，该引擎专门设计用于处理非结构化数据。 I have zero experience with such systems and I don't know how well the rest of your application would lend itself to this alternative storage strategy, but it might be worth looking into. 我对此类系统的经验为零，我不知道您的应用程序的其余部分将如何适应这种替代存储策略，但是可能值得研究。