简体   繁体   中英

Iterate over large external postgres db, manipulate rows, write output to rails postgres db

I've got a postgres DB with around 200,000,000 rows and 6 columns. The columns have int, date, and string, but no primary key and no unique values on which to base a primary key.

The records in this DB contain the raw data I need for one of my rails postgres models. I'd like to iterate through the full external DB, perform a calculation on each row of it, and then write the output to my rails model.

I've got no issue connecting to the DB or accessing records through ActiveRecord, but everything I try for iterating over the DB is failing or taking far too long. I've tried the following:

  • ExternalDB.all.each
  • ExternalDB.find_all.each
  • Adding an "id" column to ExternalDB using these instructions

I think the answer will be to do the iterations using SQL, but I'm not even sure how to start on that.

From a Postgres perspective:

You don't need to have unique values in order to have an index . (There are unique indexes , to be sure, which is what having a primary key enforces, but that isn't required.)

The first thing would be to have an index containing the search terms that you will be using. If you're going to be searching on all 6 of the values you mentioned, then you should try creating an index on those 6 values.

Depending on the exact nature of your query, it doesn't necessarily guarantee that that index will be used, however. It will depend in part on how many rows the query planner thinks will be returned by the query, which will in turn determine whether it tries to use the index for the scan or if it goes to do a sequential scan .

So, once you create that index, try in psql or PgAdmin a SELECT that you would want to use, and run an EXPLAIN on it to see if the query planner is planning to use the index or not, and then run it to see how it performs.

If it performs well then you can integrate it back into your Rails code, probably via raw SQL .

You will want to use a cursor, either a protocol-level one or an SQL-level cursor with DECLARE and FETCH .

Handily, someone already wrote an ActiveRecord adapter for PostgreSQL cursors ; see rubygems .

You might also find this question informative: Are there any Ruby ORMs which use cursors or smart fetch?

I haven't checked the source code / docs to see if the Pg gem supports PostgreSQL's protocol-level cursors for batched reads, but if there's already a tool to do it (as linked above) it's probably not worth exploring.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM