[英]How to add an unique constraint (ignoring special characters) on a text column in postgres?
How to add an unique constraint (ignoring special characters) on text column in Postgres?如何在 Postgres 的文本列上添加唯一约束(忽略特殊字符)?
CREATE TABLE my_table(
SomeTextColumn citext
CONSTRAINT person_u_1 UNIQUE (SomeTextColumn)
);
In the above table, I'm trying to add an unique constraint that will look for uniqueness by ignoring special characters in the incoming data在上表中,我尝试添加一个唯一约束,该约束将通过忽略传入数据中的特殊字符来寻找唯一性
For example:
1. HelloWorld --> Gets inserted successfully
2. Hello World --> Should fail with duplicate constraint
2. Hello%$^&*W^%orld --> Should fail with duplicate constraint
You can create a unique
index that implements the check:您可以创建一个实现检查的
unique
索引:
create unique index t_txt_unique on t(regexp_replace(txt, '\W', '', 'g'));
The regexp removes all non-word characters from the string, retaining only alphanumeric characters and the undescore _
.正则表达式从字符串中删除所有非单词字符,仅保留字母数字字符和下划线
_
。 You can adjust the character class to as needed.您可以根据需要调整字符类。
Demo on DB Fiddle : DB Fiddle 上的演示:
create table t (id int, txt citext);
create unique index t_txt_unique on t(regexp_replace(txt, '\W', '', 'g'));
insert into t values(1, 'HelloWorld');
-- ok
insert into t values(1, 'Hello World');
-- ERROR: duplicate key value violates unique constraint "t_txt_unique"
-- DETAIL: Key (regexp_replace(txt, '\W'::text, ''::text, 'g'::text))=(HelloWorld) already exists.
insert into t values(1, 'Hello%$^&*W^%orld');
-- ERROR: duplicate key value violates unique constraint "t_txt_unique"
-- DETAIL: Key (regexp_replace(txt, '\W'::text, ''::text, 'g'::text))=(HelloWorld) already exists.
insert into t values(1, 'Hello Mars');
-- ok
The question is older but I think some additional notes might be useful...这个问题比较老,但我认为一些额外的注释可能有用......
CREATE UNIQUE INDEX t_txt_unique ON t(REGEXP_REPLACE(UPPER(txt), '\\W', '', 'g'));
CREATE UNIQUE INDEX t_txt_unique ON t(REGEXP_REPLACE(UPPER(txt), '[^0-9A-Z]', '\\1', 'g'));
CREATE EXTENSION IF NOT EXISTS unaccent WITH SCHEMA public;
CREATE UNIQUE INDEX t_txt_unique ON t(REGEXP_REPLACE(UPPER(UNACCENT(txt)), '[^0-9A-Z]', '\\1', 'g'));
SQL-Code SQL代码
-- -- Check REGEXP_REPLACE to remove unwanted characters. -- Test-String: '^°!"§$%&/()=?`+#äöüÜÖÄ',.-_:;@ABµC123abc123' -- SELECT '^°!"§$%&/()=?`+#äöüÜÖÄ'',.-_:;@ABµC123abc123' AS original_text ,REGEXP_REPLACE( UPPER( '^°!"§$%&/()=?`+#äöüÜÖÄ'',.-_:;@ABµC123abc123' ), '\W', '', 'g') AS replace_word ,REGEXP_REPLACE( UPPER( '^°!"§$%&/()=?`+#äöüÜÖÄ'',.-_:;@ABµC123abc123' ), '[^0-9A-Z]', '\1', 'g') AS replace_upper_keep_num_a_to_z ,REGEXP_REPLACE( UPPER(UNACCENT( '^°!"§$%&/()=?`+#äöüÜÖÄ'',.-_:;@ABµC123abc123' )), '[^0-9A-Z]', '\1', 'g') AS replace_keep_num_a_to_z_umlaut ;
Result of the above SQL code as HTML table It was to complicated for the markdown table in the final result (preview was working).上述 SQL 代码作为 HTML 表的结果最终结果中的降价表很复杂(预览正在运行)。
<table border="1"><tr BGCOLOR="#CCCCFF"><th>original_text</th><th>replace_word</th><th>replace_upper_keep_num_a_to_z</th><th>replace_keep_num_a_to_z_umlaut</th></tr> <tr><td>^°!"§$%&/()=?`+#äöüÜÖÄ',.-_:;@<>ABµC123abc123</td><td>ÄÖÜÜÖÄ_AB?C123ABC123</td><td>ABC123ABC123</td><td>AOUUOAABC123ABC123</td></tr> </table>
As you might have noticed the character 'µ' has been transformed to upper case 'M' in the second column using the '\\W' replacement which is a bit strange.您可能已经注意到,在第二列中使用 '\\W' 替换字符 'µ' 已转换为大写的 'M',这有点奇怪。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.