简体   繁体   中英

ruby - can't create Thread (35) (ThreadError)

I'm very new to ruby and I'm learning how to do process in multiple threads. What I do is parse a 170mb xml file using Nokogiri and I'm putting the database(Postgresql) insert inside a new thread inside my .each(). Please suggest a better approach in handling this very large file and doing it in multiple threads. Here's what I have so far.

    conn = PGconn.connect("localhost", 5432, "", "", "oaxis","postgres","root")

    f = File.open("metadata.xml")
    doc = Nokogiri::XML(f)

    counter = 0

    threadArray = []

    doc.xpath('//Title').each do |node|
        threadArray[counter] = Thread.new{
        titleVal = node.text
        random_string = (0...10).map{ ('a'..'z').to_a[rand(26)] }.join

        conn.prepare('ins'+random_string, 'insert into sample_tbl (title) values ($1)')
        conn.exec_prepared('ins'+random_string, [titleVal])

        puts titleVal+" ==>"+random_string+ " \n"

        counter += 1
       }

    end

threadArray.each {|t| t.join}

f.close

What you are doing will not result in the data being inserted faster into the database, compared to the singlethreaded case. MRI Ruby has a global interpreter lock and will only ever run a single thread at a time. Using threads in MRI Ruby only improves performance when the threads are performing IO actions (or waiting to be able to do so) and program progress does not depend on the results of those IO actions (so you don't actively wait for them).

I advise you to stop using Threads here and instead calculate all the values you wish to insert and them mass insert them. The code will also be simpler to understand and reason about. Even inserting them one by one from a single thread will be faster, but there's no reason to do that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM