简体   繁体   English

混合的“类似索引”的btree结构-PostgreSQL可以做到吗?

[英]Hybrid “Index-like” btree structure - can PostgreSQL do this?

I am new to PostgreSQL. 我是PostgreSQL的新手。 I have a very unusual requirement for a hybrid database I need to build. 对于需要构建的混合数据库,我有一个非常不寻常的要求。 From modules I've seen, it seems to me that the following is possible. 从我所看到的模块中,在我看来以下是可能的。

I need to be able to add key - [values] into an index without actually adding data to a table. 我需要能够将键-[值]添加到索引中,而无需实际将数据添加到表中。 Simply put, I need a key-[values] store, ideally as a btree (lookup speed). 简而言之,我需要一个key- [values]存储,理想情况下是btree(查找速度)。 An index structure is ideal. 索引结构是理想的。 Perhaps another structure will do this. 也许另一个结构可以做到这一点。

To be very specific, I wish to store something like: 具体来说,我希望存储以下内容:

KEY     [IDs]
Blue    10, 20, 23, 47
Green   5, 12, 40

I don't want the overhead of storing this data AND indexing it. 我不希望存储此数据并为其编制索引的开销。 I just need the data "indexed but not stored" so to speak. 可以这么说,我只需要“已索引但未存储”的数据。

Equally important is the ability to query these structures and get the data (IDs) out, and be able to perform INTERSECTS etc. on the IDs, and IN, BETWEEN, =, etc. on the keys. 同样重要的是查询这些结构并获取数据(ID)的能力,并能够对ID执行INTERSECTS等,并且对键执行IN,BETWEEN,=等。

As you can probably guess, the end goal is a final list of IDs, which would then be sent to the client, and looked up at will. 您可能会猜到,最终目标是ID的最终列表,然后将其发送给客户端并随意查看。

EDIT 编辑

What I don't want is to record the key for every value. 我不希望记录每个值的键。 Using the example above, I don't want to store {Blue, 10}, {Blue, 20} etc. I want to store {Blue, [10, 20, 23, 47]}. 使用上面的示例,我不想存储{Blue,10},{Blue,20}等。我想存储{Blue,[10,20,23,47]}。

If I store this as a traditional table, I cannot see a way around this duplicate problem. 如果将其存储为传统表,则看不到解决此重复问题的方法。

Looking again at Blue, [10, 20, 23, 47]}, this is technically nothing more than a single btree, where the IDs (10, 20, 23, 47) are marked as values, and the parent key "Blue" is marked as a key. 再次查看Blue [10,20,23,47]},从技术上讲,它仅是一个btree,其中ID(10、20、23、47)被标记为值,并且父键为“ Blue”被标记为密钥。

Since this data type mismatch could be messy in a single tree, I believe the ideal solution is "[btrees] in a btree", where "btree" is the key, and [btrees] is a btree for each group of values of a key. 由于这种数据类型的不匹配在单棵树中可能很杂乱,因此我认为理想的解决方案是“ [btrees in a btree]”,其中“ btree”是键,而[btrees]是btree的每组值的btree键。

If you really insist on doing it this way you can store the values as an array, and the intarray module provides operators to manipulate those. 如果您真的坚持要这样做,则可以将值存储为数组,并且intarray模块提供了运算符来操纵这些值。 That is: 那是:

create table data(key text primary key, values int[] not null);
insert into data
  values('Blue', '{10,20,23,47}'),('Green','{5,12,40}'),('Red', '{5,10,28}');

with this you can write: 可以这样写:

select unnest(values) from data where key = 'Blue'
  intersect
  select unnest(values) from data where key = 'Red';

Ideally you need an aggregate function to convert an int[] to a set and calculate intersections etc., but they don't seem to be provided. 理想情况下,您需要一个聚合函数将int []转换为集合并计算交集等,但是似乎没有提供它们。

Really, this is just a slightly more compact storage of the more typical structure: 确实,这只是更典型的结构的紧凑存储:

select key, unnest(values) as value from data;
  key  | value
-------+-------
 Blue  |    10
 Blue  |    20
 Blue  |    23
[...]

In fact, you can simply define a view to be the above query. 实际上,您可以简单地将视图定义为上述查询。

A more normalised approach would be to have two tables: one to describe keys, one to associate them with values: 一种更规范的方法是拥有两个表:一个用于描述键,一个用于将它们与值相关联:

create table key_dimension(key_id serial primary key, key text not null unique);
insert into key_dimension(key) values('Blue'),('Green'),('Red');
create table key_value(key_id int not null references key_dimension(key_id), value int not null, primary key(key_id, value));
insert into key_value(key_id, value)
  select key_id, unnest(values) from key_dimension join data using (key);

and now: 现在:

select value from key_value
  where key_id = (select key_id from key_dimension where key = 'Red')
intersect
select value from key_value
  where key_id = (select key_id from key_dimension where key = 'Blue')

So any queries to select key values need run only against the set of keys (key_dimension), and then a minimal synthetic key (key_id) is used to convert these to actual sets of data values (from key_value). 因此,选择键值的任何查询仅需针对键集(key_dimension)运行,然后使用最小的合成键(key_id)将其转换为实际的数据值集(来自key_value)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM