简体   繁体   中英

Python: piping mysqldump output of MySQL databases through gzip using mutithreading hangs

I'm looking to improve performance of backup operation of my MySQL database. I then came up with an idea of performing backup for 2-3 mysql databases (each of large size, ~- 70+ GB, and most of them are innodb) simultaneously, and my requirement is basically based on a need to get thing done very quickly. The below is the script I'm currently using for my backup process written in Python.

I then wrote up the below backup script, which I performed mysqldump piping through gzip for each database in different threads. I started to run it by passing 2 arguments to back up Staging and Production dbs respectively. From the middle of a test, I noticed the script was causing my Ubuntu box, on which there is no other workload running, to run on swap files and it started to hang. (Unfortunately, I couldn't manage to take snapshot in time for some evidence as more focusing on killing the unnecessary resource hungry running tasks.)

Most importantly, I'd like to get your expert opinions if normally this approach can be done effectively or not. Given, from the HW spec side, my server looks like a very powerful one already, with 64 GB of RAM and 32 core CPU/ 2.7 GHz each. Or might this be the case that my databases are so large in their size and the system's piping buffer wouldn't seem to be large enough for handling the operation of piping mysqldump output to gzip on the fly? So, the system ended up going on swap files and subsequently freezed.

backup code (backup.py)

from datetime import datetime
import threading
import ftplib
import sys
import os

hostname = '..host name..'
username = '..user name..'
password = '..password..'

class backupThread (threading.Thread):
    def __init__(self, threadID, counter, image_name):

        self.threadID = threadID
        self.counter = counter
        self.db_name = db_name
        threading.Thread.__init__(self)

    def run(self):

        dt = datetime.now().strftime('%Y%m%d%H%M')
        filename = "%s_%s.sql" % (self.db_name,dt)

        os.popen("mysqldump -u %s -p%s -h %s -e --opt -c %s | gzip -c > %s.gz" % (username, password, hostname, self.db_name, filename))


dbnames = sys.argv[1:]

i = 1   
threads = []

for dbname in dbnames:

    thread = backupThread(i, i, dbname )   
    thread.start()
    threads.append( thread )        
    i += 1  

for t in threads:
    t.join()

calling command

python backup.py staging production

Is there a way that I can improve my script or get this to work as per requirement? Any advice would be very much appreciated.

The fastest and consistent way to do it If you can afford a few seconds of service disruption:

  • Stop the database - your system become unavailable from here
  • Take a file system snapshot (takes ~2 seconds as it uses a copy-on-write approach in general)
  • Start the database - your system is back after this point
  • Make a binary backup from the taken snapshot.

If you are using a Red Hat (based) linux distribution then you have a good chance to be already using LVM. If you aren't then you need LVM or other solution that gives you filesystem snapshots to adopt this approach.

If you just want to fix your script then write the mysqldump to disk and compress it later. This should not affect the performance as you are writing to the disk when using swap anyway.

[UPDATE]

I found a reference that gives you more details: https://www.percona.com/blog/2006/08/21/using-lvm-for-mysql-backup-and-replication-setup/

So, you don't need to stop the server completely as you can use FLUSH TABLES WITH READ LOCK to do the work on MySql.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM