如何在 postgres 的文本列上添加唯一约束（忽略特殊字符）？

Question

How to add an unique constraint (ignoring special characters) on text column in Postgres?如何在 Postgres 的文本列上添加唯一约束（忽略特殊字符）？

CREATE TABLE my_table(
    SomeTextColumn citext
CONSTRAINT person_u_1 UNIQUE (SomeTextColumn)
);

In the above table, I'm trying to add an unique constraint that will look for uniqueness by ignoring special characters in the incoming data在上表中，我尝试添加一个唯一约束，该约束将通过忽略传入数据中的特殊字符来寻找唯一性

For example:
1. HelloWorld --> Gets inserted successfully
2. Hello World --> Should fail with duplicate constraint
2. Hello%$^&*W^%orld --> Should fail with duplicate constraint

Answer 1

You can create a unique index that implements the check:您可以创建一个实现检查的unique索引：

create unique index t_txt_unique on t(regexp_replace(txt, '\W', '', 'g'));

The regexp removes all non-word characters from the string, retaining only alphanumeric characters and the undescore _ .正则表达式从字符串中删除所有非单词字符，仅保留字母数字字符和下划线_ 。 You can adjust the character class to as needed.您可以根据需要调整字符类。

Demo on DB Fiddle : DB Fiddle 上的演示：

create table t (id int, txt citext);
create unique index t_txt_unique on t(regexp_replace(txt, '\W', '', 'g'));

insert into t values(1, 'HelloWorld');
-- ok

insert into t values(1, 'Hello World');
-- ERROR:  duplicate key value violates unique constraint "t_txt_unique"
-- DETAIL:  Key (regexp_replace(txt, '\W'::text, ''::text, 'g'::text))=(HelloWorld) already exists.

insert into t values(1, 'Hello%$^&*W^%orld');
-- ERROR:  duplicate key value violates unique constraint "t_txt_unique"
-- DETAIL:  Key (regexp_replace(txt, '\W'::text, ''::text, 'g'::text))=(HelloWorld) already exists.

insert into t values(1, 'Hello Mars');
-- ok

Answer 2

The question is older but I think some additional notes might be useful...这个问题比较老，但我认为一些额外的注释可能有用......

Uniqueness on TEXT is always problematic because text is case-sensitive (not only in PostgreSQL). TEXT 的唯一性总是有问题，因为文本区分大小写（不仅在 PostgreSQL 中）。
You can get duplicates because "HelloWorld" is not the same as "HELLOWORLD".您可能会得到重复项，因为“HelloWorld”与“HELLOWORLD”不同。 Because of that you might want to add the UPPER function if you create a UNIQUE INDEX on text fields of a database:因此，如果您在数据库的文本字段上创建 UNIQUE INDEX，您可能想要添加 UPPER 函数：
CREATE UNIQUE INDEX t_txt_unique ON t(REGEXP_REPLACE(UPPER(txt), '\\W', '', 'g'));
You might want to use the following REGEXP_REPLACE option to only keep the characters 0-9 and AZ (removing umlauts):您可能希望使用以下 REGEXP_REPLACE 选项仅保留字符 0-9 和 AZ（删除变音符号）：
CREATE UNIQUE INDEX t_txt_unique ON t(REGEXP_REPLACE(UPPER(txt), '[^0-9A-Z]', '\\1', 'g'));
If you want to preserve umlauts as general character (Ü --> U, Ä --> A, etc.) you could throw in the UNACCENT function ( https://www.postgresql.org/docs/current/unaccent.html - it's an extension you need to add to PostgreSQL but I might also help you to search for something - keep in mind that only the superuser or other admin user might be able/allowed to add EXTENSIONs):如果您想将变音符号保留为通用字符（Ü --> U、Ä --> A 等），您可以使用 UNACCENT 函数（ https://www.postgresql.org/docs/current/unaccent.html - 这是您需要添加到 PostgreSQL 的扩展，但我也可能会帮助您搜索某些内容 - 请记住，只有超级用户或其他管理员用户才能/允许添加扩展）：
CREATE EXTENSION IF NOT EXISTS unaccent WITH SCHEMA public;
CREATE UNIQUE INDEX t_txt_unique ON t(REGEXP_REPLACE(UPPER(UNACCENT(txt)), '[^0-9A-Z]', '\\1', 'g'));

SQL-Code SQL代码

--
-- Check REGEXP_REPLACE to remove unwanted characters.
-- Test-String: '^°!"§$%&/()=?`+#äöüÜÖÄ',.-_:;@ABµC123abc123'
--
SELECT '^°!"§$%&/()=?`+#äöüÜÖÄ'',.-_:;@ABµC123abc123' AS original_text
      ,REGEXP_REPLACE( UPPER( '^°!"§$%&/()=?`+#äöüÜÖÄ'',.-_:;@ABµC123abc123' ), '\W', '', 'g') AS replace_word
      ,REGEXP_REPLACE( UPPER( '^°!"§$%&/()=?`+#äöüÜÖÄ'',.-_:;@ABµC123abc123' ), '[^0-9A-Z]', '\1', 'g') AS replace_upper_keep_num_a_to_z
      ,REGEXP_REPLACE( UPPER(UNACCENT( '^°!"§$%&/()=?`+#äöüÜÖÄ'',.-_:;@ABµC123abc123' )), '[^0-9A-Z]', '\1', 'g') AS replace_keep_num_a_to_z_umlaut
;

Result of the above SQL code as HTML table It was to complicated for the markdown table in the final result (preview was working).上述 SQL 代码作为 HTML 表的结果最终结果中的降价表很复杂（预览正在运行）。

 <table border="1"><tr BGCOLOR="#CCCCFF"><th>original_text</th><th>replace_word</th><th>replace_upper_keep_num_a_to_z</th><th>replace_keep_num_a_to_z_umlaut</th></tr> <tr><td>^°!&quot;§$%&amp;/()=?`+#äöüÜÖÄ',.-_:;@&lt;&gt;ABµC123abc123</td><td>ÄÖÜÜÖÄ_AB?C123ABC123</td><td>ABC123ABC123</td><td>AOUUOAABC123ABC123</td></tr> </table>

As you might have noticed the character 'µ' has been transformed to upper case 'M' in the second column using the '\\W' replacement which is a bit strange.您可能已经注意到，在第二列中使用 '\\W' 替换字符 'µ' 已转换为大写的 'M'，这有点奇怪。

如何在 postgres 的文本列上添加唯一约束（忽略特殊字符）？

问题描述

2 个解决方案

解决方案1
5 已采纳 2020-01-13 22:38:20

解决方案2
0 2021-07-21 16:43:12

如何在 postgres 的文本列上添加唯一约束（忽略特殊字符）？

问题描述

2 个解决方案

解决方案1 5 已采纳 2020-01-13 22:38:20

解决方案2 0 2021-07-21 16:43:12

解决方案1
5 已采纳 2020-01-13 22:38:20

解决方案2
0 2021-07-21 16:43:12