模糊匹配 Postgres 中较大字符串中的 substring

Question

是否可以在 Postgres 的较大字符串中模糊匹配 substring？

例子：

对于colour (ou) 的搜索，返回字符串包含color 、 colors或colour的所有记录。

select 
  * 
from things
where fuzzy(color) in description;

id | description
----------------
1  | A red coloured car
2  | The garden
3  | Painting colors

=> return records 1 and 3

我想知道是否可以将fuzzystrmatch和tsvector结合起来，以便可以将模糊匹配应用于每个矢量化术语？

或者如果有另一种方法？

Answer 1

你当然可以这样做，但我怀疑它会非常有用：

select *,levenshtein(lexeme,'color')  from things, unnest(to_tsvector('english',description))
   order by levenshtein;

 id |    description     | lexeme | positions | weights | levenshtein 
----+--------------------+--------+-----------+---------+-------------
  3 | Painting colors    | color  | {2}       | {D}     |           0
  1 | A red coloured car | colour | {3}       | {D}     |           1
  1 | A red coloured car | car    | {4}       | {D}     |           3
  1 | A red coloured car | red    | {2}       | {D}     |           5
  3 | Painting colors    | paint  | {1}       | {D}     |           5
  2 | The garden         | garden | {2}       | {D}     |           6

大概您希望修饰查询以应用一些截止，可能截止取决于长度，并且假设它满足该截止，则仅返回每个描述的最佳结果。 这样做应该只是常规的 SQL 操作。

最近添加到pg_trgm中的单词相似性运算符可能会更好。

select *, description <->> 'color' as distance from things order by description <->> 'color';

 id |    description     | distance 
----+--------------------+----------
  3 | Painting colors    | 0.166667
  1 | A red coloured car | 0.333333
  2 | The garden         |        1

另一种选择是找到一个标准化英式/美式拼写的词干分析器或词库（我不知道有一个现成的），然后根本不使用模糊匹配。 我认为这将是最好的，如果你能做到的话。

模糊匹配 Postgres 中较大字符串中的 substring

问题描述

1 个解决方案

解决方案1
1 2020-05-14 18:56:46

模糊匹配 Postgres 中较大字符串中的 substring

问题描述

1 个解决方案

解决方案1 1 2020-05-14 18:56:46

解决方案1
1 2020-05-14 18:56:46