简体   繁体   English

如何处理数据库中的重复?

[英]How to deal with duplicates in database?

在一个程序中,我们是否应该使用try catch来检查表中是否存在重复值,或者我们应该检查表中是否已存在该值并避免插入?

This is easy enough to enforce with a UNIQUE constraint on the database side so that's my recommendation. 这很容易在数据库端使用UNIQUE约束强制执行,这是我的建议。 I try to put as much of the data integrity into the database so that I can avoid having bad data (although sometimes unavoidable). 我尝试将尽可能多的数据完整性放入数据库中,这样我就可以避免出现不良数据(尽管有时候是不可避免的)。

If this is how you already have it you might as well just catch the mysql exception for duplicate value insertion on such a table as doing the check then the insertion is more costly then having the database do one simple lookup (and possibly an insert). 如果这就是你已经拥有它的方式,你可能只是抓住mysql异常,在这样的表上进行重复值插入,然后进行检查,然后插入成本更高,然后让数据库进行一次简单的查找(可能还有插入)。

Depends upon whether you are inserting one, or a million, as well as whether the duplicate is the primary key. 取决于您是插入一个还是一百万,以及副本是否是主键。

If its the primary key, read: http://database-programmer.blogspot.com/2009/06/approaches-to-upsert.html 如果是主键,请阅读: http//database-programmer.blogspot.com/2009/06/approaches-to-upsert.html

An UPSERT or ON DUPLICATE KEY... The idea behind an UPSERT is simple. UPSERT或ON DUPLICATE KEY ...... UPSERT背后的想法很简单。 The client issues an INSERT command. 客户端发出INSERT命令。 If a row already exists with the given primary key, then instead of throwing a key violation error, it takes the non-key values and updates the row. 如果已存在具有给定主键的行,则它将获取非键值并更新行,而不是抛出键冲突错误。

This is one of those strange (and very unusual) cases where MySQL actually supports something you will not find in all of the other more mature databases. 这是一个奇怪的(也是非常不寻常的)案例,其中MySQL实际上支持在所有其他更成熟的数据库中找不到的东西。 So if you are using MySQL, you do not need to do anything special to make an UPSERT. 因此,如果您使用MySQL,则不需要做任何特殊的事情来制作UPSERT。 You just add the term "ON DUPLICATE KEY UPDATE" to the INSERT statement: 您只需将术语“ON DUPLICATE KEY UPDATE”添加到INSERT语句中:

If it's not the primary key, and you are inserting just one row, then you can still make sure this doesn't cause a failure. 如果它不是主键,并且您只插入一行,那么您仍然可以确保这不会导致失败。

For your actual question, I don't really like the idea of using try/catch for program flow, but really, you have to evaluate readability and user experience (in this case performance), and pick what you think is the best of mix of the two. 对于你的实际问题,我不太喜欢将try / catch用于程序流程的想法,但实际上,你必须评估可读性和用户体验(在这种情况下是性能),并选择你认为最好的混合两个。

You can add a UNIQUE constraint to your table.. Something like 您可以为表添加UNIQUE约束。类似于

CREATE TABLE IF NOT EXISTS login
(
    loginid SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    loginname CHAR(20) NOT NULL,
    UNIQUE (loginname) 
);

This will ensure no two login names are the same. 这将确保没有两个登录名相同。

您可以创建唯一的复合键

ALTER TABLE `TableName` ADD UNIQUE KEY (KeyOne, KeyTwo, ...);

您只需在表中创建一个唯一键,以便它不允许再次添加相同的值。

You should try inserting the value and catch the exception. 您应该尝试插入值并捕获异常。 In a busy system, if you check for the existience of a value it might get inserted between the time you check and the time you insert it. 在繁忙的系统中,如果检查值是否存在,则可能会在检查时间和插入时间之间插入值。

Let the database do it's job, let the database check for the duplicate entry. 让数据库完成它的工作,让数据库检查重复的条目。

A database is a computerized representation of a set of business rules and a DBMS is used to enforce these business rules as constraints. 数据库是一组业务规则的计算机化表示,DBMS用于将这些业务规则强制实施为约束。 Neither can verify a proposition in the database is true in the real world. 在现实世界中,都不能验证数据库中的命题是否属实。 For example, if the model in question is the employees of an enterprise and the Employees table contains two people named 'Jimmy Barnes' DBMS (nor the database) cannot know whether one is a duplicate, whether either are real people, etc. A trusted source is required to determine existence and identity. 例如,如果有问题的模型是企业的员工,而Employees表包含两个名为'Jimmy Barnes'的人,那么DBMS(也不是数据库)无法知道一个是否是重复的,无论是真人还是其他人。需要来源来确定存在和身份。 In the above example, the enterprise's personnel department is responsible for checking public records, perusing references, ensuring the person is not already on the payroll, etc then allocating an unique employee reference number that can be used as a key. 在上面的示例中,企业的人事部门负责检查公共记录,仔细阅读参考资料,确保人员尚未在工资单上等,然后分配可用作密钥的唯一员工参考编号。 This is why we look for industry-standard identifiers with a trusted source: ISBN for books, VIN for cars, ISO 4217 for currencies, ISO 3166 for countries, etc. 这就是为什么我们寻找具有可靠来源的行业标准标识符:书籍的ISBN,汽车的VIN,货币的ISO 4217,国家的ISO 3166等。

I think it is better to check if the value already exists and avoid the insertion. 我认为最好检查值是否已存在并避免插入。 The check for duplicate values can be done in the procedure that saves the data (using exists if your database is an SQL database). 可以在保存数据的过程中检查重复值(如果数据库是SQL数据库,则使用exists)。

If a duplicate exists you avoid the insertion and can return a value to your app indicating so and then show a message accordingly. 如果存在重复项,则避免插入并可以向应用程序返回一个值,指示如此,然后相应地显示消息。

For example, a piece of SQL code could be something like this: 例如,一段SQL代码可能是这样的:

    select @ret_val = 0
   If exists (select * from employee where last_name = @param_ln and first_name = @param_fn)
       select @ret_val = -1
    Else
       -- your insert statement here

   Select @ret_val

Your condition for duplicate values will depend on what you define as a duplicate record. 重复值的条件取决于您定义为重复记录的内容。 In your application you would use the return value to know if the data was a duplicate. 在您的应用程序中,您将使用返回值来了解数据是否重复。 Good luck! 祝好运!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM