简体   繁体   中英

Rails remove duplicated associated records

Let's say I have a User and User has_many :tags and I would like to remove all @users tags that have duplicated name . For example,

@user.tags #=> [<Tag name: 'A'>, <Tag name: 'A'>, <Tag name: 'B'>]

I would like to keep only the tags with unique names and delete the rest from the database.

I know I could pull out a list of unique tags names from user's tags and remove all users's tags and re-create user's tags with only unique names but it would be ineffficient?

On the other hand, select won't work as it returns only the selected column. uniq also won't work:

@user.tags.uniq #=> returns all tags

Is there a more efficient way?

UPDATE: I would like to do this in a migration.

This method will give you an ActiveRecord::Relation with the duplicate tags:

class Tag < ApplicationRecord
  belongs_to :user

  def self.duplicate_tags
    unique = self.select('DISTINCT ON(tags.name, tags.user_id) tags.id')
     .order(:name, :user_id, :id)
    self.where.not(id: unique)
  end
end

Its actually run as a single query:

SELECT  "tags".* FROM "tags" 
WHERE "tags"."id" NOT IN 
 (SELECT DISTINCT ON(tags.name) tags.id 
  FROM "tags" GROUP BY "tags"."id", "tags"."user_id" 
  ORDER BY tags.name, tags.id)

You can remove the duplicates in a single query with #delete_all .

# Warning! This can't be undone!
Tag.duplicate_tags.destroy_all

If you need to destroy dependent associations or call your before_* or after_destroy callbacks, use the #destroy_all method instead. But you should use this together with #in_batches to avoid running out of memory.

# Warning! This can't be undone!
Tag.duplicate_tags.in_batches do |batch|
  # destroys a batch of 1000 records
  batch.destroy_all
end

You can write SQL model-independent query in the migration. Here is PostgreSQL-specific migration code:

execute <<-SQL
  DELETE FROM tags
  WHERE id NOT IN (
    SELECT DISTINCT ON(user_id, name) id FROM tags
    ORDER BY user_id, name, id ASC
  )
SQL

And here is more SQL common one:

execute <<-SQL
  DELETE FROM tags
  WHERE id IN (
    SELECT DISTINCT t2.id FROM tags t1
    INNER JOIN tags t2
    ON (
      t1.user_id = t2.user_id AND
      t1.name = t2.name AND
      t1.id < t2.id
    )
  )
SQL

This SQL fiddle shows different queries you can use as sub-select in DELETE query depending on your goals: deleting first/last/all duplicates.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM