简体   繁体   English

检查重复记录的最佳方法

[英]Best way to check for duplicated records

I have two tables A and B with a relationship of One-to-many from A to B . 我有两个表AB ,关系从AB一对多。

A has 5 columns: A有5列:

a1, a2, a3, a4, a5 

and B has 5 columns B有5列

b1, b2, b3, b4, a1. 

Note a1 is foreign key in table B. 注意a1是表B中的外键。

I have a requirement to check duplicate records in the table ie no two records should have exactly same values for all the attributes. 我有一个要求检查表中的重复记录,即所有属性的任何两个记录都不应具有完全相同的值。

The most efficient way I can think of for determining their uniqueness is by creating a checksum sort of value and keep it in every row of table A. But this requires extra space plus I will have to make sure that the checksum is really unique. 我能想到的最有效的方法是确定它们的唯一性,方法是创建一个校验和类型的值并将其保留在表A的每一行中。但这需要额外的空间,而且我将必须确保校验和确实是唯一的。

Is this the best way to go ahead or is there some other way I am unaware of? 这是前进的最佳方式,还是我不知道其他方式?

For eg Lets say table A is Rules Table and Table B is Trigger table. 例如,假设表ARules表,表BTrigger表。 Now Rules table has records of various rules created by different users.(This means that there will be a mapping to Users Table in Rules Table.). 现在, Rules表记录了由不同用户创建的各种规则(这意味着将在Rules表中映射到Users表)。 Now what I actually want is that a user should not be able to create identical rules. 现在,我真正想要的是用户不应该能够创建相同的规则。 So when a user saves rules I run a query to check if there is record of identical checksum for this particular user if yes then I give the appropriate error otherwise I let the user to create the record.I guess this clears that why I can't put unique constraint on all records. 因此,当用户保存规则时,我将运行查询以检查是否有针对该特定用户的相同校验和的记录(如果是),然后给出适当的错误,否则我将让用户创建该记录。我想这清楚了为什么我可以•对所有记录施加唯一约束。

Do a SELECT with a GROUP BY clause. 使用GROUP BY子句执行SELECT。 For example: 例如:

SELECT a1, a2, a3, a4, a5, COUNT(*) FROM #TempPersons GROUP BY a1, a2, a3, a4, a5 HAVING COUNT(*) > 1;

This will return a result with the a1, a2, a3, a4, a5 and a count of how many times that value appears 这将返回a1,a2,a3,a4,a5的结果以及该值出现多少次的计数

Having a UNIQUE constraint on those columns seems like the way to go. 在这些列上具有UNIQUE约束似乎是可行的方法。

However, just for the sake of answering your other remarks: I've worked with extra columns to check for changes in the past before. 但是,仅是为了回答您的其他意见:我以前使用过额外的列来检查过去的更改。 Back then I did something similar to this: 那时我做了类似的事情:

CONVERT([NVARCHAR](42),HASHBYTES('SHA1',CONCAT(Column1, '||', Column2, ...),(1))

I found it to be a rather nice way to concat many columns into a single hash, unique depending on it's contents & without it blowing out of proportion. 我发现这是将许多列合并为单个哈希的一种不错的方法,根据其内容而唯一,并且不会超出比例。 (I used this in a datawarehousing environment, to check large tables for record level changes based on a business key. Stored this as a PERSISTED column to allow an index to run on this too). (我在数据仓库环境中使用了此功能,用于根据业务密钥检查大型表的记录级别更改。将其存储为PERSISTED列,以允许索引也基于此键运行)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM