简体   繁体   English

ruby on rails postgresql 活动记录并行更新

[英]ruby on rails postgresql active record parallle update

I have model called AisSignal with about 3000 records and I am running each one against another model called Footprint with about 10 records, so we have a loop 3000 x 10.我有一个名为 AisSignal 的 model,它有大约 3000 条记录,我将每个记录都与另一个名为 Footprint 的 model 一起运行,它有大约 10 条记录,所以我们有一个 3000 x 10 的循环。

I tried:我试过了:

Parallel.each(AisSignal.all, in_processes: 8) do |signal|
  Footprint.all.each do |footprint|
    if footprint.cover([signal.lon, signal.lat])
      signal.update(imo: 'in')
      break
    end
  end
end

but it runs in 10 seconds just like normal block.但它像普通块一样在 10 秒内运行。

I tried to change from processes to threads like below but this causes application freezing.我尝试从进程更改为线程,如下所示,但这会导致应用程序冻结。

Parallel.each(AisSignal.all, in_threads: 8) do |signal|
  Footprint.all.each do |footprint|
    if footprint.cover([signal.lon, signal.lat])
      signal.update(imo: 'in')
      break
    end
  end
end

I have 50 pool size in database.yml我在 database.yml 中有 50 个池大小

Any idea or approach to have multiple threads that run in parallel to update records.让多个线程并行运行以更新记录的任何想法或方法。 I will need to update more records actually which can take about minutes.我实际上需要更新更多记录,这可能需要几分钟时间。

Threads and forks often don't play well with database connections.线程和分叉通常不能很好地处理数据库连接。 If not handled correctly the threads/processes can wind up trying to use the same connection at the same time.如果处理不当,线程/进程可能会同时尝试使用相同的连接。

Parallel mentions this in their documentation . Parallel 在他们的文档中提到了这一点 You need to make use of connection pooling .您需要使用连接池

A connection pool synchronizes thread access to a limited number of database connections.连接池将线程访问同步到有限数量的数据库连接。 The basic idea is that each thread checks out a database connection from the pool, uses that connection, and checks the connection back in. ConnectionPool is completely thread-safe, and will ensure that a connection cannot be used by two threads at the same time, as long as ConnectionPool's contract is correctly followed.基本思想是每个线程从池中检出一个数据库连接,使用该连接,然后重新检入该连接。ConnectionPool 是完全线程安全的,将确保一个连接不能同时被两个线程使用,只要正确遵守 ConnectionPool 的合同。 It will also handle cases in which there are more threads than connections: if all connections have been checked out, and a thread tries to checkout a connection anyway, then ConnectionPool will wait until some other thread has checked in a connection.它还将处理线程数多于连接数的情况:如果所有连接都已签出,并且线程尝试签出连接,则 ConnectionPool 将等待其他线程签入连接。

Parallel.each(AisSignal.all, in_threads: 8) do |signal|
  ActiveRecord::Base.connection_pool.with_connection do
    Footprint.all.each do |footprint|
      if footprint.cover([signal.lon, signal.lat])
        signal.update(imo: 'in')
        break
      end
    end
  end
end

Note that this code is very inefficient.请注意,此代码非常低效。

  1. It loads the entire AisSignal table.它加载整个AisSignal表。
  2. For each signal it loads and scans the entire Footprint table.对于每个信号,它加载并扫描整个Footprint表。

It will use a lot of memory, and it will run in s*f time where s is the number of signals and f is the number of footprints.它将使用大量 memory,并将在 s*f 时间内运行,其中s是信号数, f是足迹数。

You can reduce the memory footprint by replacing Footprint.all.each with Footprint.find_each .您可以通过将Footprint.all.each替换为Footprint.find_each来减少 memory 占用空间。 This will load rows in batches.这将分批加载行。

Threading is not how you make database queries faster .线程不是使数据库查询更快的方法 The fundamental problem is you're scanning Footprint multiple times in Ruby rather than letting the database do it.根本问题是您在 Ruby 中多次扫描足迹,而不是让数据库执行此操作。 if footprint.cover([signal.lon, signal.lat]) should instead be a where clause. if footprint.cover([signal.lon, signal.lat])应该改为 where 子句。

AisSignal.find_each do |signal|
  # With ... being the equivalent of `cover([signal.lon, signal.lat])`
  # as a where clause.
  signal.update!(imo: 'in') if Footprint.exists?(...)
end

This could be done even faster as a join.这可以作为连接更快地完成。

# ... is the equivalent of `cover([signal.lon, signal.lat])`
AisSignal.joins("inner join footprints on ...").update_all(imo: 'in')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM