简体   繁体   English

将来自不同表的两个 postgresql tsvector 字段连接到单个 postgresql 视图中,以启用连接的全文搜索

[英]Concat two postgresql tsvector fields originating in separate tables into single postgresql view to enable joined full text search

I have a postgresql view that is comprised as a combination of 3 tables:我有一个由 3 个表组成的 postgresql 视图:

create view search_view as 
select u.first_name, u.last_name, a.notes, a.summary, a.search_index 
from user as u, assessor as a, connector as c 
where a.connector_id = c.id and c.user_id = u.id;

However, I need to concat tsvector fields from 2 of the 3 table into a single tsvector field in the view which provides full text search across 4 fields: 2 from one table, and 2 from another.但是,我需要将 3 个表中的 2 个 tsvector 字段连接到视图中的单个 tsvector 字段中,该字段提供跨 4 个字段的全文搜索:2 个来自一个表,2 个来自另一个。

I've read the documentation stating that I can use the concat operator to combine two tsvector fields, but I'm not certain what this looks like syntactically, and also whether there are potential gotchas with this implementation.我已经阅读了说明我可以使用 concat 运算符组合两个 tsvector 字段的文档,但我不确定这在语法上是什么样子,也不确定这个实现是否有潜在的问题。

I'm looking for example code that concats two tsvector fields from separate tables into a view, and also commentary on whether this is a good or bad practice in postgresql land.我正在寻找将两个 tsvector 字段从不同的表连接到视图中的示例代码,以及关于这在 postgresql 领域中是好还是坏的做法的评论。

I was wondering the same thing.我想知道同样的事情。 I don't think we are supposed to be combining tsvectors from multiple tables like this.我认为我们不应该像这样组合来自多个表的 tsvectors。 Best solution is to:最好的解决办法是:

  1. create a new tsv column in each of your tables (user, assessor, connector)在每个表(用户、评估员、连接器)中创建一个新的 tsv 列
  2. update the new tsv column in each table with all of the text you want to search.使用您要搜索的所有文本更新每个表中的新 tsv 列。 for example in the user table you would update the tsv column of all records concatenating first_name and last_name columns.例如,在用户表中,您将更新连接 first_name 和 last_name 列的所有记录的 tsv 列。
  3. create an index on the new tsv column, this will be faster than indexing on the individual columns在新的 tsv 列上创建索引,这比在单个列上建立索引要快
  4. Run your queries as usual, and let Postgres do the "thinking" about which indexes to use.像往常一样运行你的查询,让 Postgres 去“思考”要使用哪些索引。 It may or may not use all indexes in queries involving more than one table.它可能会或可能不会在涉及多个表的查询中使用所有索引。
  5. use the ANALYZE and EXPLAIN commands to look at how Postgres is utilizing your new indexes for particular queries, and this will give you insight into speeding things up further.使用 ANALYZE 和 EXPLAIN 命令查看 Postgres 如何利用您的新索引进行特定查询,这将使您深入了解进一步加快速度。

This will be my approach at least.至少这将是我的方法。 I to have been doing lots of reading and have found that people aren't combining data from multiple tables into tsvectors.我一直在做大量阅读,发现人们并没有将来自多个表的数据组合到 tsvectors 中。 In fact I don't think this is possible, it may only be possible to use the columns of the current table when creating a tsvector.实际上我认为这是不可能的,可能只有在创建 tsvector 时才能使用当前表的列。

Concatenating tsv vectors works but as per comments, index is probably not used this way (not an expert, can't say if it does or does not).连接 tsv 向量有效,但根据评论,索引可能不会以这种方式使用(不是专家,不能说它是否确实如此)。

SELECT * FROM newsletters
LEFT JOIN campaigns ON newsletters.campaign_id=campaigns.id
WHERE newsletters.tsv || campaigns.tsv @@ to_tsquery(unaccent(?))

The reason why you'd want this is to search for an AND string like txt1 & txt2 & txt 3 which is very common usage scenario.你想要这个的原因是搜索像txt1 & txt2 & txt 3这样的 AND 字符串,这是非常常见的使用场景。 If you simpy split the search by an OR WHERE campaigns.tsv @@ to_tsquery(unaccent(?) this won't work because it will try to match all 3 tokens in both tsv column but the tokens could be in either column.如果您通过OR WHERE campaigns.tsv @@ to_tsquery(unaccent(?)拆分搜索,这将不起作用,因为它将尝试匹配两个 tsv 列中的所有 3 个标记,但标记可能位于任一列中。

One solution which I found is to use triggers to insert and update the tsv column in table1 whenever the table2 changes, see: https://dba.stackexchange.com/questions/154011/postgresql-full-text-search-tsv-column-trigger-with-many-to-many but this is not a definitive answer and using that many triggers is error prone and hacky.我发现的一种解决方案是使用触发器在 table2 更改时插入和更新 table1 中的 tsv 列,请参阅: https : //dba.stackexchange.com/questions/154011/postgresql-full-text-search-tsv-column -trigger-with-many-to-many但这不是一个明确的答案,使用这么多触发器很容易出错并且很麻烦。

Official documentation and some tutorials also show concatenating all the wanted colums into a ts vector on the fly without using a tsv column.官方文档和一些教程还展示了在不使用 tsv 列的情况下将所有想要的列即时连接到 ts 向量中。 But it is unclear how much slower is the on-the-fly versus tsv column approach, I can't find a single benchmark or explanation about this.但目前尚不清楚动态与 tsv 列方法相比慢了多少,我找不到关于此的单一基准或解释。 The documenntation simply states:该文件简单地说:

Another advantage is that searches will be faster, since it will not be necessary to redo the to_tsvector calls to verify index matches.另一个优点是搜索会更快,因为不需要重做 to_tsvector 调用来验证索引匹配。 (This is more important when using a GiST index than a GIN index; see Section 12.9.) The expression-index approach is simpler to set up, however, and it requires less disk space since the tsvector representation is not stored explicitly. (这在使用 GiST 索引时比使用 GIN 索引更重要;请参阅第 12.9 节。)然而,表达式索引方法设置起来更简单,并且由于 tsvector 表示没有显式存储,因此它需要更少的磁盘空间。

All I can tell from this is that tsv columns are probably waste of resources and just complicate things but it'd be nice to see some hard numbers.我只能从中看出 tsv 列可能会浪费资源并且只会使事情复杂化,但很高兴看到一些硬数字。 But if you can concat tsv columns like this, then I guess it's no different than doing it in a WHERE clause.但是,如果您可以像这样连接 tsv 列,那么我想这与在 WHERE 子句中执行此操作没有什么不同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM