繁体   English   中英

PostgreSQL 查询在存储 PII 数据的多个表列中搜索多个词

[英]PostgreSQL Query to search multiple words in multiple table columns storing PII Data

有没有一种有效的方法来搜索多个单词,我的意思是在包含 PII 数据的表字段中出现 50 个单词

截至目前我有这个查询

 SELECT reporter FROM case_detail cd WHERE (cd.reporter)::text IN (‘ID_ISSUE_PLACE’,‘USER_NAME’,‘STATE’,‘PROOFTYPE_1’,‘ID_EXPIRES_ALLOWED’,‘KIN_MIDDLENAME’,‘EMAIL’,‘GENDER’,‘COUNTRY’,‘PROOFTYPE_2’,‘RESIDENCE_COUNTRY’,‘KIN_AGE’,‘ID_TYPE’,‘CONTACT_PERSON’,‘BIRTHPLACE’,‘PROOFTYPE_3’,‘NATIONALITY’,‘KIN_CONTACTNUMBER’,‘ID_NO’,‘CONTACT_NO’,‘NOMINATION_DETAIL’,‘MIDDLE_NAME’,‘EMPLOYER_NAME’,‘KIN_NATIONALITY’,‘SSN’,‘MSISDN’,‘IMSI’,‘LAST_NAME’,‘DOB’,‘POSTAL_CODE’,‘KIN_NATIONALITYNO’,‘ADDRESS1’,‘DIST_MSISDN’,‘ID_ISSUE_DATE’,‘REFERENCEID’,‘KIN_RELATIONSHIP’,‘ADDRESS2’,‘RET_MSISDN’,‘ID_ISSUE_COUNTRY’,‘REGION’,‘SOURCE_OF_INCOME’,‘CITY’,‘BUSINESS_NAME’,‘ID_EXPIRY_DATE’,‘KIN_FIRSTNAME’,‘ORGANIZATION_NAME’,‘PROOFID_1’,‘PROOFID_2’,'PROOFID_3’,'KIN_LASTNAME’,‘Wallet')

这是抛出 SQL 语法错误

我们需要加密包含在这些表字段中的数据

任何帮助将不胜感激

   SELECT reporter FROM case_detail cd WHERE (cd.reporter)::text IN (‘ID_ISSUE_PLACE’,‘USER_NAME’,‘STATE’,‘PROOFTYPE_1’,‘ID_EXPIRES_ALLOWED’,‘KIN_MIDDLENAME’,‘EMAIL’,‘GENDER’,‘COUNTRY’,‘PROOFTYPE_2’,‘RESIDENCE_COUNTRY’,‘KIN_AGE’,‘ID_TYPE’,‘CONTACT_PERSON’,‘BIRTHPLACE’,‘PROOFTYPE_3’,‘NATIONALITY’,‘KIN_CONTACTNUMBER’,‘ID_NO’,‘CONTACT_NO’,‘NOMINATION_DETAIL’,‘MIDDLE_NAME’,‘EMPLOYER_NAME’,‘KIN_NATIONALITY’,‘SSN’,‘MSISDN’,‘IMSI’,‘LAST_NAME’,‘DOB’,‘POSTAL_CODE’,‘KIN_NATIONALITYNO’,‘ADDRESS1’,‘DIST_MSISDN’,‘ID_ISSUE_DATE’,‘REFERENCEID’,‘KIN_RELATIONSHIP’,‘ADDRESS2’,‘RET_MSISDN’,‘ID_ISSUE_COUNTRY’,‘REGION’,‘SOURCE_OF_INCOME’,‘CITY’,‘BUSINESS_NAME’,‘ID_EXPIRY_DATE’,‘KIN_FIRSTNAME’,‘ORGANIZATION_NAME’,‘PROOFID_1’,‘PROOFID_2’,'PROOFID_3’,'KIN_LASTNAME’,‘Wallet')

还有这个

SELECT reporter FROM case_detail WHERE reporter CONTAINS 'ID_ISSUE_PLACE And USER_NAME And STATE And PROOFTYPE_1 And ID_EXPIRES_ALLOWED And KIN_MIDDLENAME And EMAIL And GENDER And COUNTRY And PROOFTYPE_2 And RESIDENCE_COUNTRY And KIN_AGE And ID_TYPE And CONTACT_PERSON And BIRTHPLACE And PROOFTYPE_3 And NATIONALITY And KIN_CONTACTNUMBER And ID_NO And CONTACT_NO And NOMINATION_DETAIL And MIDDLE_NAME And EMPLOYER_NAME And KIN_NATIONALITY And SSN And MSISDN And IMSI And LAST_NAME And DOB And POSTAL_CODE And KIN_NATIONALITYNO And ADDRESS1 And DIST_MSISDN And ID_ISSUE_DATE And REFERENCEID And KIN_RELATIONSHIP And ADDRESS2 And RET_MSISDN And ID_ISSUE_COUNTRY And REGION And SOURCE_OF_INCOME And CITY And BUSINESS_NAME And ID_EXPIRY_DATE And KIN_FIRSTNAME And ORGANIZATION_NAME And PROOFID_1 And PROOFID_2 And PROOFID_3 And KIN_LASTNAME And Wallet';

您可以将 reporter 存储为 tsvector,这有效地将文本转换为可索引的单词数组。 可以使用以下查询将单独的自动生成的 tsvector 列添加到表中。

ALTER TABLE case_detail ADD COLUMN
text_search tsvector GENERATED ALWAYS AS to_tsvector('english', reporter);

然后,您可以使用 @@ contains 运算符和适当的 tsquery 搜索相关案例。

SELECT reporter
FROM case_detail cd
WHERE cd.text_search @@ to_tsquery('word1 & word2 & ... & wordn');

您甚至可以在表中添加 GIN 索引,以提高大型表的文本搜索性能。

CREATE INDEX case_detail_text_search_idx ON case_detail USING GIN (text_search);

但是,您应该注意的一件事是,由于下划线不是常规单词的一部分,postgres 将假定由下划线分隔的文本各自是它们自己的单词,因此 'USER_NAME' 和 'USER' & 'NAME' 被相同对待。 如果包含下划线的文本是必须的,并且您对文本搜索的要求仅限于一组包含另一组关键字的关键字,那么也可以将文本存储为文本数组。 在创建生成的列时,只需将 to_tsvector('english', reporter) 替换为 regexp_split_to_array(reporter, '\s+') ,使用 @> ['word1, 'word2', ...] 代替 select 并且 GIN 索引可以与 tsvector 一样在 text[] 类型上创建。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM