简体   繁体   中英

Automating data dump and restore between Postgres and MySQL

I've just set up a delta-loading data flow between multiple Mysql DBs and a Porgres DB. It's only copying tens of Mbs every 15mins.

Yet, I'd like to set a process to fully load the data between them in case of emergency...

Python is just crashing and seems not to be fast enough when using SQLachemy etc.

I've read that the best might be to just dump everything from MySQL into CSV and then use file_fdw to load the entire tables into Postgres..

Has anyone faced a similar issue? If yes, how did you proceed?

Long story made short, ORM overhead is killing your performance.

When you're not manipulating the objects involved, it's better to use SQA Core expressions ("SQL Expressions") which are almost as fast as pure SQL.

Solution:

Of course I'm presuming your MySQL and Postgres models have been meticulously synchronized (ie values from an object from MySQL are not a problem for creating object in Postgres model and vice versa).

Overview:

  • get Table objects out of declarative classes
  • select (SQLAlchemy Expression) from one database
  • convert rows to dict s
  • insert into the other database

More or less:

# get tables
m_table = ItemMySQL.__table__
pg_table = ItemPG.__table__

# SQL Expression that gets a range of rows quickly
pg_q = select([pg_table]).where(
    and_(
        pg_table.c.id >= id_start,
        pg_table.c.id <= id_end,

))

# get PG DB rows
eng_pg = DBSessionPG.get_bind()
conn_pg = eng_pg.connect()
result = conn_pg.execute(pg_q)
rows_pg = result.fetchall()


for row_pg in rows_pg:
    # convert PG row object into dict
    value_d = dict(row_pg)
    # insert into MySQL
    m_table.insert().values(**value_d)

# close row proxy object and connection, else suffer leaks
result.close()
conn_pg.close()

Background on performance, see accepted answer (by SQA principal author himself):

Why is SQLAlchemy insert with sqlite 25 times slower than using sqlite3 directly?

Since you seem to have Python crashing, perhaps you're using too much memory? Hence I suggest reading and writing rows in batches.

A further improvement could be using .values to insert a number of rows in one call, see here: http://docs.sqlalchemy.org/en/latest/core/tutorial.html#inserts-and-updates

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM