简体   繁体   中英

python subprocess and mysqldump

I know parts of this question have been asked before, but I have some related questions.

I'm trying to execute

mysqldump -u uname -ppassword --add-drop-database --databases databaseName | gzip > fileName

I'm potentially dumping a very large (200GB?) db. Is that in itself a dumb thing to do? I then want to send the zipped file over the network for storage, delete the local dump, and purge a couple of tables.

Anyway, I was using subprocess like this, because there doesn't seem to be a way to execute the entire original call without subprocess considering | to be a table name.:

from subprocess import Popen, PIPE

f = open(FILENAME, 'wb')
args = ['mysqldump', '-u', 'UNAME', '-pPASSWORD', '--add-drop-database', '--databases', 'DB']

p1 = Popen(args, stdout=PIPE)
P2 = Popen('gzip', stdin=p1.stdout, stdout=f)
p2.communicate()

but then I read that communicate caches the data in memory, which wouldn't work for me. Is this true?

What I ended up doing for now is:

import gzip
subprocess.call(args, stdout=f)
f.close()

f = open(filename, 'rb')
zipFilename = filename + '.gz'
f2 = gzip.open(zipFilename, 'wb')
f2.writelines(f)
f2.close()
f.close()

of course this takes a million years, and I hate it.

My Questions: 1. Can I use my first approach on a very large db? 2. Could I possibly pipe the output of mysqldump to a socket and fire it across the network and save it when it arrives, rather than sending a zipped file?

Thanks!

You don't need communicate(). Its only there as a convenience method if you want to read stdout/stderr to completion. But since you are chaining the commands, they are doing that for you. Just wait for them to complete.

from subprocess import Popen, PIPE

args = ['mysqldump', '-u', 'UNAME', '-pPASSWORD', '--add-drop-database', '--databases', 'DB']

with open(FILENAME, 'wb', 0) as f:
    p1 = Popen(args, stdout=PIPE)
    p2 = Popen('gzip', stdin=p1.stdout, stdout=f)
p1.stdout.close() # force write error (/SIGPIPE) if p2 dies
p2.wait()
p1.wait()

Your example code using two subprocess.Popen calls is correct (albeit slightly-improve-able), and this:

... I read that communicate caches the data in memory

is also correct—it reads into memory all of the standard-output and standard-error-output that the "communicating command" produces on a subprocess.PIPE —but not a problem here , because you have this:

p1 = Popen(args, stdout=PIPE)
P2 = Popen('gzip', stdin=p1.stdout, stdout=f)
p2.communicate()

You're calling communicate() on p2 , whose stdout output is sent to f (an opened file), and whose stderr output—which is probably empty anyway (no errors occur)—is not being sent to a PIPE . Thus, p2.communicate() would at worst have to read and buffer-up a grand total of zero bytes of stdout plus zero bytes of stderr. It's actually a bit more clever, noticing that there is no PIPE , so it returns the tuple (None, None) .

If you were to call p1.communicate() , that would be more of an issue (although in this case you'd then be fighting with p2 , the gzip process, for the output from p1 , which would be even worse). But you are not; p1 's output flows to p2 , and p2 's output flows to a file.

Since none of p2 's output is sent to a PIPE , there is no need to call p2.communicate() here: you can simply call p2.wait() . That makes it clearer that there's no data flowing back from p2 (which I would say is a minor improvement to the code, although if you decide you want to capture p2 's stderr after all, you'd have to change that back).


Edit to add: as in glglgl's answer, it's important to close p1 's pipe to p2 after creating p2, otherwise p2 will wait for your Python process to send data to p2 , too.

You are quite close to where you want:

from subprocess import Popen, PIPE

f = open(FILENAME, 'wb')
args = ['mysqldump', '-u', 'UNAME', '-pPASSWORD', '--add-drop-database', '--databases', 'DB']

p1 = Popen(args, stdout=PIPE)

Till here it is right.

p2 = Popen('gzip', stdin=p1.stdout, stdout=PIPE)

This one takes p1 's output and processes it. Afterwards we can (and should) immediately p1.stdout.close() .

Now we have a p2.stdout which can be read from and, without using a temporary file, send it via the network:

s = socket.create_connection(('remote_pc', port))
while True:
    r = p2.stdout.read(65536)
    if not r: break
    s.send(r)

Yup the data is buffered in memory:

"Note The data read is buffered in memory, so do not use this method if the data size is large or unlimited." - subprocess docs

Unfortunately at the moment there is no way to asynchronously use Popen: PEP3145

Rather than doing this all in python you can manually do

os.system("mysqldump -u uname -ppassword --add-drop-database --databases databaseName | gzip > fileName

")

with the appropriate string replacements using string.format of course; otherwise you're putting an unnecessary amount of stress on your computer, especially trying to communicate 200gb via a pipe ...

Can you elaborate on what you are trying to do? Right now it sounds like you're both dumping and zipping on the same computer.


Yes you can stream a file across the network .. I don't know if you want to directly stream the output of mysql directly though - you might want to look at your network capabilities before considering that


bash:

#!/bin/bash
mysqldump -u uname -ppassword --add-drop-database --databases databaseName | gzip > fileName
 #transfer fileName to other computer

^ you can also put this in a crontab and have it run at intervals :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM