简体   繁体   English

如何并行运行一些作业

[英]How to run some jobs in parallel

After reading all posts about parallelism in Ruby, I got only confused, so I will describe what I want to do. 在阅读了有关Ruby中并行性的所有文章之后,我仅感到困惑,因此我将描述我想做什么。

I have names that contains around 1000 names. 我的names包含大约1000个名字。

names
=> [{"name"=>"tickets"}, {"name"=>"events"}, {"name"=>"channel"}, {"name"=>"primes"}]

For each name I want to drop a table if it exists using pg. 对于每个名称,我想使用pg删除一个表。

drop_str = "DROP TABLE IF EXISTS %s ;"
create_str = "CREATE TABLE %s (id SERIAL PRIMARY KEY,bkk varchar(255))"

names.each do |name|
    conn.exec((drop_str % name["name"]) + (create_str % name["name"]))
end

But, I do not want to drop tables one after another one. 但是,我不想一个接一个地删除表。 I want to do it in parallel. 我想并行执行。

My idea is to use following: 我的想法是使用以下内容:

threads = []
drop_str = "DROP TABLE IF EXISTS %s ;"
create_str = "CREATE TABLE %s (id SERIAL PRIMARY KEY,bkk varchar(255))"

names.each do |name|
    threads.push(Thread.new{conn.exec((drop_str % name["name"]) + (create_str % name["name"]))})
end

and then to join the threads. 然后加入线程。

In reality will the tables be dropped in parallel or still one after another one? 实际上,这些表是并行放置还是一个接一个放置?

In principle, you can run multiple SQL statements in parallel. 原则上,您可以并行运行多个SQL语句。 Most database engines are multi-threaded and can execute multiple statements in parallel Sometimes it doesn't make much sense though, as when using SQLite. 大多数数据库引擎都是多线程的,并且可以并行执行多个语句。有时候,这与使用SQLite时没有太大关系。

There are several caveats though which will probably break your current code. 有几个警告可能会破坏您当前的代码。

Most importantly, a single connection to your database always has some state attached to it. 最重要的是,与数据库的单个连接始终会附加一些状态。 Often, it will hold transactions and internal state of the database adapter. 通常,它将保存数据库适配器的事务和内部状态。 As such, a single database connection is generally only usable in a single thread at a time. 这样,单个数据库连接通常一次只能在单个线程中使用。 If you attempt to send multiple parallel statements over a single connection, things will probably break pretty in-deterministically. 如果您尝试通过单个连接发送多个并行语句,则事情可能会不确定地中断。

Thus when trying to run multiple statements in parallel using threads, each threads needs its own database connection. 因此,当尝试使用线程并行运行多个语句时,每个线程都需要自己的数据库连接。 Here, it often makes sense to use thread-pools which create a fixed upper number of connections and schedule work from a queue to run on these. 在这里,使用线程池通常会很有意义,这些线程池会创建固定数量的连接,并从队列中调度工作以在这些线程上运行。

You could use eg Rails' ConnectionPool to handle the database connections and schedule your statements using one of the ThreadPool implementations from the excellent concurrent-ruby gem. 您可以使用Rails的ConnectionPool来处理数据库连接,并使用出色的并发红宝石 gem中的ThreadPool实现之一来调度您的语句。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM