在目录中迭代使用python压缩文件

Question

I need to iterate through a folder and find every instance where the filenames are identical (except for extension) and then zip (preferably using tarfile) each of these into one file. 我需要遍历一个文件夹，找到文件名相同的每个实例（扩展名除外），然后将每个文件压缩（最好使用tarfile）到一个文件中。

So I have 5 files named: "example1" each with different file extensions. 所以我有5个名为“example1”的文件，每个文件都有不同的文件扩展名。 I need to zip them up together and output them as "example1.tar" or something similar. 我需要将它们拼接在一起并输出为“example1.tar”或类似的东西。

This would be easy enough with a simple for loop such as: 使用简单的for循环就足够了，例如：

tar = tarfile.open('example1.tar',"w") tar = tarfile.open（'example1.tar'，“w”）

for output in glob ('example1*'): 用于glob中的输出（'example1 *'）：

tar.add(output) tar.add（输出）

tar.close() tar.close（）

however, there are 300 "example" files and I need to iterate through each one and their associated 5 files in order to make this work. 但是，有300个“示例”文件，我需要遍历每个文件及其相关的5个文件才能使其工作。 This is way over my head. 这是我的头脑。 Any advice greatly appreciated. 任何建议都非常感谢。

Answer 1

You could do this: 你可以这样做：

list all files in the directory 列出目录中的所有文件
create a dictionary where the basename is the key and all the extensions are values 创建一个字典，其中basename是键，所有扩展名都是值
then tar all the files by dictionary key 然后通过字典键tar所有文件

Something like this: 像这样的东西：

import os
import tarfile
from collections import defaultdict

myfiles = os.listdir(".")   # List of all files
totar = defaultdict(list)

# now fill the defaultdict with entries; basename as keys, extensions as values
for name in myfiles:
    base, ext = os.path.splitext(name)
    totar[base].append(ext)

# iterate through all the basenames
for base in totar:
    files = [base+ext for ext in totar[base]]
    # now tar all the files in the list "files"
    tar = tarfile.open(base+".tar", "w")
    for item in files:    
        tar.add(item)
    tar.close()

Answer 2

The pattern you're describing generalizes to MapReduce. 您描述的模式概括为MapReduce。 I found a simple implementation of MapReduce online, from which an even-simpler version is: 我在网上找到了一个简单的MapReduce 实现，其中一个更简单的版本是：

def map_reduce(data, mapper, reducer):
    d = {}
    for elem in data:
        key, value = mapper(elem)
        d.setdefault(key, []).append(value)
    for key, grp in d.items():
        d[key] = reducer(key, grp)
    return d

You want to group all files by their name without the extension, which you can get from os.path.splitext(fname)[0] . 您希望按名称对所有文件进行分组而不使用扩展名，您可以从os.path.splitext(fname)[0]获取该扩展名。 Then, you want to make a tarball out of each group by using the tarfile module. 然后，您希望使用tarfile模块从每个组中创建一个tarball。 In code, that is: 在代码中，即：

import os
import tarfile

def make_tar(basename, files):
    tar = tarfile.open(basename + '.tar', 'w')
    for f in files:
        tar.add(f)
    tar.close()

map_reduce(os.listdir('.'),
           lambda x: (os.path.splitext(x)[0], x),
           make_tar)

Edit : If you want to group files in different ways, you just need to modify the second argument to map_reduce . 编辑：如果要以不同方式对文件进行分组，只需要修改map_reduce的第二个参数即可。 The code above groups files that have the same value for the expression os.path.splitext(x)[0] . 上面的代码对表达式os.path.splitext(x)[0]具有相同值的文件进行分组。 So to group by the base file name with all the extensions stripped off, you could replace that expression with strip_all_ext(x) and add: 因此，要根据基本文件名对所有扩展名进行分组，可以使用strip_all_ext(x)替换该表达式并添加：

def strip_all_ext(path):
    head, tail = os.path.split(path)
    basename = tail.split(os.extsep)[0]
    return os.path.join(head, basename)

Answer 3

You have to problems. 你有问题。 Solve the separately. 单独解决。

Finding matching names. 查找匹配的名称。 Use a collections.defaultict 使用collections.defaultict
Creating tar files after you find the matching names. 找到匹配的名称后创建tar文件。 You've got that pretty well covered. 你已经很好地覆盖了它。

So. 所以。 Solve problem 1 first. 首先解决问题1。

Use glob to get all the names. 使用glob获取所有名称。 Use os.path.basename to split the path and basename. 使用os.path.basename拆分路径和基本名称。 Use os.path.splitext to split the name and extension. 使用os.path.splitext分割名称和扩展名。

A dictionary of lists can be used to save all files that have the same name. 列表字典可用于保存具有相同名称的所有文件。

Is that what you're doing in part 1? 那是你在第1部分做的吗？

Part 2 is putting the files into tar archives. 第2部分将文件放入tar档案中。 For that, you've got most of the code you need. 为此，您已获得所需的大部分代码。

Answer 4

尝试使用glob模块： http ： //docs.python.org/library/glob.html

Answer 5

#! /usr/bin/env python

import os
import tarfile

tarfiles = {}
for f in os.listdir ('files'):
    prefix = f [:f.rfind ('.') ]
    if prefix in tarfiles: tarfiles [prefix] += [f]
    else: tarfiles [prefix] = [f]

for k, v in tarfiles.items ():
    tf = tarfile.open ('%s.tar.gz' % k, 'w:gz')
    for f in v: tf.addfile (tarfile.TarInfo (f), file ('files/%s' % f) )
    tf.close ()

Answer 6

import os
import tarfile

allfiles = {}

for filename in os.listdir("."):
    basename = '.'.join (filename.split(".")[:-1] )
    if not basename in all_files:
        allfiles[basename] = [filename]
    else:
        allfiles[basename].append(filename)

for basename, filenames in allfiles.items():
    if len(filenames) < 2:
        continue
    tardata = tarfile.open(basename+".tar", "w")
    for filename in filenames:
        tardata.add(filename)
    tardata.close()

在目录中迭代使用python压缩文件

问题描述

6 个解决方案

解决方案1
2 2011-05-06 19:36:09

解决方案2
2 已采纳 2011-05-06 20:04:25

解决方案3
1 2011-05-06 19:29:55

解决方案4
0 2011-05-06 19:28:46

解决方案5
0 2011-05-06 19:37:32

解决方案6
-1 2011-05-06 19:42:13

在目录中迭代使用python压缩文件

问题描述

6 个解决方案

解决方案1 2 2011-05-06 19:36:09

解决方案2 2 已采纳 2011-05-06 20:04:25

解决方案3 1 2011-05-06 19:29:55

解决方案4 0 2011-05-06 19:28:46

解决方案5 0 2011-05-06 19:37:32

解决方案6 -1 2011-05-06 19:42:13

解决方案1
2 2011-05-06 19:36:09

解决方案2
2 已采纳 2011-05-06 20:04:25

解决方案3
1 2011-05-06 19:29:55

解决方案4
0 2011-05-06 19:28:46

解决方案5
0 2011-05-06 19:37:32

解决方案6
-1 2011-05-06 19:42:13