简体   繁体   English

处理重复行的最佳方法

[英]Best way to handle duplicated rows

I have insurance companies "dictionary" in my database, let's say:我的数据库中有保险公司的“字典”,比如说:

+----+-------------------+----------+
| ID | Name              | Data     |
+----+-------------------+----------+
| 1  | InsuranceCompany1 | SomeData |
+----+-------------------+----------+

But I'm fetching data from another system, and in result I got duplicates of insurance companies, but without my data:但是我从另一个系统获取数据,结果我得到了保险公司的副本,但没有我的数据:

+----+-------------------+----------+
| ID | Name              | Data     |
+----+-------------------+----------+
| 1  | InsuranceCompany1 | SomeData |
+----+-------------------+----------+
| 2  | InsuranceCompany1 |          |
+----+-------------------+----------+

Both records are related in variety of models but they refer to the same data, and what I want is to pair these records without changing queries or data in other tables, so noone knows there are two records, but both refer to one instance which is两条记录在各种模型中都相关,但它们引用相同的数据,我想要的是配对这些记录而不更改其他表中的查询或数据,所以没有人知道有两条记录,但都引用一个实例,即

 +----+-------------------+----------+
 | 1  | InsuranceCompany1 | SomeData |
 +----+-------------------+----------+

My question is: Is there some proper way to handle situations like this?我的问题是:是否有一些适当的方法来处理这种情况? I've came up with solution which is to add parent_id column, and manually set parent_id in duplicated rows, and then override Eloquent methods like find in a model to return parent if there is parent_id set.我想出了解决方案,即添加 parent_id 列,并在重复的行中手动设置 parent_id,然后覆盖 Eloquent 方法,如在 model 中找到方法,如果设置了 parent_id,则返回父级。

Copying SomeData column is not an option because there can be condition if insurance_company_id == id;复制 SomeData 列不是一种选择,因为如果insurance_company_id == id;

You can try creating a view of your dict table something like this:您可以尝试创建dict表的视图,如下所示:

  CREATE VIEW unique_dict AS
  SELECT MIN(ID) ID,
         Name,
         GROUP_CONCAT(Data) Data
    FROM dict
   GROUP BY Name

That will give you one row per name.这会给你每个名字一行。

Then, in your queries requiring one row per name, SELECT from the unique_dict view rather than the dict table.然后,在每个名称需要一行的查询中,SELECT 来自unique_dict视图而不是dict表。

GROUP_CONCAT() yields a list of values from Data , which helps if more than one duplicated row contains a value: you get them all. GROUP_CONCAT()Data中产生一个值列表,如果有多个重复的行包含一个值,这会有所帮助:你得到它们。

Longer term you might be smart to consider these duplicates to be "dirty data", and clean them up as you INSERT new rows.从长远来看,您可能明智地将这些重复项视为“脏数据”,并在您插入新行时清理它们。 How to do that?怎么做?

Create a unique index on Name .Name上创建唯一索引。

CREATE UNIQUE INDEX unique_name ON dict(Name);

Then, when loading new data into dict use Eloquent's updateOrCreate() function.然后,在将新数据加载到dict时,使用 Eloquent 的updateOrCreate() function。 Here's something to read about that.这是一些值得阅读的内容。 Laravel 5.1 Create or Update on Duplicate Laravel 5.1 创建或更新重复

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM