简体   繁体   中英

Recursively append files to zip archive in python

In Python 2.7.4 on Windows, if I have a directory structure that follows:

test/foo/a.bak
test/foo/b.bak
test/foo/bar/c.bak
test/d.bak

And I use the following to add them to an existing archive such that 'd.bak' is at the root of the archive:

import zipfile
import os.path
import fnmatch

def find_files(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in files:
            if fnmatch.fnmatch(basename, pattern):
                filename = os.path.join(root, basename)
                yield filename

if __name__=='__main__':
    z = zipfile.ZipFile("testarch.zip", "a", zipfile.ZIP_DEFLATED)

    for filename in find_files('test', '*.*'):
        print 'Found file:', filename
        z.write(filename, os.path.basename(filename), zipfile.ZIP_DEFLATED)

    z.close()

The directory of the zip file is flat. It creates the foo/ directory only if a sub-directory exists in it (If I exclude test/foo/bar/c.bak , it does not create the directory. If it is included, foo/ is created but not foo/bar/ if that makes sense), but no sub-directories or files:

foo/
a.bak
b.bak
c.bak
d.bak

Am I missing something?

The problem is that you're explicitly asking it to flatten all the paths:

z.write(filename, os.path.basename(filename), zipfile.ZIP_DEFLATED)

If you look at the docs , the default arcname is:

the same as filename , but without a drive letter and with leading path separators removed

But you're overriding that with os.path.basename(filename) . (If you don't know what basename does, it returns "the last pathname component". If you don't want just the last pathname component, don't call basename .)

If you just do z.write('test/foo/bar/c.bak') , it will create a zip entry named test/foo/bar/c.bak , but if you do z.write('test/foo/bar/c.bak', 'c.bak') , it will create a zip entry named c.bak . Since you do that for all of the entries, the whole thing ends up flattened.

I figured it out. As abarnet pointed out, I had misread the docs on zipfiles. Using the following function, I can create the correct archive name for the zip file:

def createArchName(path):
    line = path
    if "\\" in line:
        ''' windows '''
        discard, val = line.split("\\", 1)
        return val
    else:
        ''' unix '''
        discard, val = line.split("/", 1)
        return val

For those interested, the full code is as follows:

import urllib2
import zipfile
import os.path
import fnmatch

def find_files(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in files:
            if fnmatch.fnmatch(basename, pattern):
                filename = os.path.join(root, basename)
                yield filename

def createArchName(path):
    line = path
    if "\\" in line:
        ''' windows '''
        discard, val = line.split("\\", 1)
        return val
    else:
        ''' unix '''
        discard, val = line.split("/", 1)
        return val


if __name__=='__main__':
    if not os.path.exists("test"):
        os.mkdir("test")

    z = zipfile.ZipFile("testarch.zip", "a", zipfile.ZIP_DEFLATED)

    for filename in find_files('test', '*.*'):
        archname = createArchName(filename)
        print 'Found file:', archname
        z.write(filename, archname, zipfile.ZIP_DEFLATED)

    z.close()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM