简体   繁体   English

使用 PostgreSQL 数据库作为简单键值存储的最佳方式

[英]Best way to use a PostgreSQL database as a simple key value store

I am being required to use a postgreSQL database and it will replace my current use of berkeleyDB.我被要求使用 postgreSQL 数据库,它将取代我目前使用的 berkeleyDB。 Although;虽然; I realize this is not an ideal situation, it is beyond my control.我意识到这不是一个理想的情况,它超出了我的控制范围。

So the question is... If you were required to make postgreSQL into a key value store how would you go about doing this, while making it as efficient as possible?所以问题是......如果你被要求将 postgreSQL 变成一个键值存储,你将如何做到这一点,同时使其尽可能高效?

My values are byte arrays and my key's are strings, I could impose some restrictions on the lengths of these strings.我的值是字节数组,我的键是字符串,我可以对这些字符串的长度施加一些限制。

I assume I should use a blob for my value and primary key column holding the key, but as I am just venturing into this journey I am curious if anyone in the stack overflow community has done this, or if there are any specific 'gotchas' I should look out for.我假设我应该使用 blob 作为我的值和主键列保存键,但是当我刚刚踏入这个旅程时,我很好奇堆栈溢出社区中是否有人这样做过,或者是否有任何特定的“陷阱”我应该注意一下。

The extension in Postgresql to properly do this is called hstore. Postgresql 中正确执行此操作的扩展名为 hstore。 It works in a similar fashion as you would expect other key-value store systems.它的工作方式与您期望的其他键值存储系统类似。 Just load the extension.只需加载扩展程序。 The syntax is unique but if you have ever used redis or mongo you will get it quickly.语法是独一无二的,但如果您曾经使用过 redis 或 mongo,您将很快掌握它。 Don't make it harder than it is.不要让它变得比现在更难。 I understand, we often don't get to pick our tools and have to make do.我明白,我们经常无法选择我们的工具而不得不凑合。
Here is the document page:这是文档页面:

http://www.postgresql.org/docs/9.1/static/hstore.html http://www.postgresql.org/docs/9.1/static/hstore.html

If you are forced to use relational database, I would suggest to try to find structure in your data to take advantage of the fact, since you forgo the advantage of speed you got with unstructured data and key-value store.如果您被迫使用关系数据库,我建议您尝试在数据中找到结构以利用这一事实,因为您放弃了使用非结构化数据和键值存储获得的速度优势。 The more structure you find, the better advantage you get out of your predicament.你找到的结构越多,你就越能摆脱困境。 Even if you only find structure in the keys.即使您只在键中找到结构。

Also consider if you will only need sequential or random access to your data and in which ratio and structure your database by this requirement.还要考虑您是否只需要顺序或随机访问您的数据,以及根据此要求以何种比例和结构构建您的数据库。 Are you going to do queries on your values by type for example?例如,您是否要按类型对您的值进行查询? Each of those questions could have effect on how you structure your database.这些问题中的每一个都可能影响您构建数据库的方式。

One specific consideration about blobs in postgresql they are internally represented as pg_largetable (loid:oid,pageno:int4,data:bytea).关于 postgresql 中 blob 的一项具体考虑,它们在内部表示为 pg_largetable (loid:oid,pageno:int4,data:bytea)。 The size of the chunks is defined by LOBBLKSIZE, but typically 2k.块的大小由 LOBBLKSIZE 定义,但通常为 2k。 So if you can use byte arrays in your table instead of blobs and limit size of your value/key pair under blocksize, you can avoid this indirection through second table.因此,如果您可以在表中使用字节数组而不是 blob 并在块大小下限制值/键对的大小,则可以通过第二个表避免这种间接性。 You could also increase the block size if you have access to configuration of the database.如果您有权访问数据库的配置,您还可以增加块大小。

I'd suggest to go looking for structure in data and patterns in data access and then ask your question again with more detail.我建议在数据访问中寻找数据结构和模式,然后更详细地再次询问您的问题。

Another option is to use JSON or JSONB with a unique hash index on the key.另一种选择是使用 JSON 或 JSONB,并在键上具有唯一的哈希索引。

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

CREATE TABLE key_values (
    key uuid DEFAULT uuid_generate_v4(),
    value jsonb
);

CREATE INDEX idx_key_values ON key_values USING hash (key);

Some queries一些查询

SELECT * FROM key_values WHERE key = '1cfc4dbf-a1b9-46b3-8c15-a03f51dde891';
Time: 0.514 ms
postgres=# SELECT * FROM key_values WHERE key = '1cfc4dbf-a1b9-46b3-8c15-a03f51dde890';
Time: 1.747 ms

postgres=# do $$
begin
for r in 1..1000 loop
INSERT INTO key_values (value)
VALUES ('{"somelarge_json": "bla"}');
end loop;
end;
$$;
DO
Time: 58.327 ms

You can't run efficient range queries like with B-tree, but it should have better read/write performance.你不能像 B-tree 那样运行高效的范围查询,但它应该有更好的读/写性能。 Index should be about 60% smaller.指数应该小 60% 左右。

What do you need to store as a value ?您需要将什么存储为值? Strings ?字符串? Ints ?整数 ? Objects (eg serialized Java objects).对象(例如序列化的 Java 对象)。 A simple implementation would work with a 3 column table looking like:一个简单的实现可以使用 3 列表,如下所示:

NAME(VARCHAR)   TYPE(VARCHAR)   VALUE(VARCHAR)

(perhaps the TYPE is some enumeration). (也许 TYPE 是一些枚举)。 The above wouldn't work for binary data like serialised objects, though and perhaps you need a BLOB there.上面的方法不适用于像序列化对象这样的二进制数据,但也许你需要一个 BLOB。

Alternatively (and probably a much better idea), have you seen Apache Commons Configuration ?或者(和可能是一个更好的主意),你看到Apache的百科全书配置 You can back that with a database (via JDBC) and you can store properties such that you retrieve them thus:您可以使用数据库(通过 JDBC)支持它,并且您可以存储属性以便您检索它们:

// get a property called 'number'
Double double = config.getDouble("number");
Integer integer = config.getInteger("number");

That may save you a lot of grief in terms of implementation.这可能会在实施方面为您省去很多麻烦。 You may have a problem with saving binary data, in that you'd have to serialise it prior to insertion and post-retrieval.可能在保存二进制数据时遇到问题,因为您必须在插入和检索后对其进行序列化。 But I've used this in the past for storing ints,doubles and serialised Java objects via XStream, so I can confirm it works well.但我过去曾使用它通过 XStream 存储整数、双精度数和序列化的 Java 对象,因此我可以确认它运行良好。

It really should be dependant on what the key will be.这真的应该取决于关键是什么。 If it will always be a string under 255 characters, then use a Varchar as yoru PK and then use a blob (assuming a large value) for the value.如果它始终是 255 个字符以下的字符串,则使用 Varchar 作为 yoru PK,然后使用 blob(假设一个大值)作为该值。 if it will always be a number, use int, etc.如果它总是一个数字,请使用 int 等。

In other words, need more info to really give you a good answer :)换句话说,需要更多信息才能真正给你一个好的答案:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM