[英]Iterate within directory to zip files with python
I need to iterate through a folder and find every instance where the filenames are identical (except for extension) and then zip (preferably using tarfile) each of these into one file. 我需要遍历一个文件夹,找到文件名相同的每个实例(扩展名除外),然后将每个文件压缩(最好使用tarfile)到一个文件中。
So I have 5 files named: "example1" each with different file extensions. 所以我有5个名为“example1”的文件,每个文件都有不同的文件扩展名。 I need to zip them up together and output them as "example1.tar" or something similar.
我需要将它们拼接在一起并输出为“example1.tar”或类似的东西。
This would be easy enough with a simple for loop such as: 使用简单的for循环就足够了,例如:
tar = tarfile.open('example1.tar',"w")
tar = tarfile.open('example1.tar',“w”)
for output in glob ('example1*'):
用于glob中的输出('example1 *'):
tar.add(output)
tar.add(输出)
tar.close()
tar.close()
however, there are 300 "example" files and I need to iterate through each one and their associated 5 files in order to make this work. 但是,有300个“示例”文件,我需要遍历每个文件及其相关的5个文件才能使其工作。 This is way over my head.
这是我的头脑。 Any advice greatly appreciated.
任何建议都非常感谢。
You could do this: 你可以这样做:
Something like this: 像这样的东西:
import os
import tarfile
from collections import defaultdict
myfiles = os.listdir(".") # List of all files
totar = defaultdict(list)
# now fill the defaultdict with entries; basename as keys, extensions as values
for name in myfiles:
base, ext = os.path.splitext(name)
totar[base].append(ext)
# iterate through all the basenames
for base in totar:
files = [base+ext for ext in totar[base]]
# now tar all the files in the list "files"
tar = tarfile.open(base+".tar", "w")
for item in files:
tar.add(item)
tar.close()
The pattern you're describing generalizes to MapReduce. 您描述的模式概括为MapReduce。 I found a simple implementation of MapReduce online, from which an even-simpler version is:
我在网上找到了一个简单的MapReduce 实现 ,其中一个更简单的版本是:
def map_reduce(data, mapper, reducer):
d = {}
for elem in data:
key, value = mapper(elem)
d.setdefault(key, []).append(value)
for key, grp in d.items():
d[key] = reducer(key, grp)
return d
You want to group all files by their name without the extension, which you can get from os.path.splitext(fname)[0]
. 您希望按名称对所有文件进行分组而不使用扩展名,您可以从
os.path.splitext(fname)[0]
获取该扩展名。 Then, you want to make a tarball out of each group by using the tarfile
module. 然后,您希望使用
tarfile
模块从每个组中创建一个tarball。 In code, that is: 在代码中,即:
import os
import tarfile
def make_tar(basename, files):
tar = tarfile.open(basename + '.tar', 'w')
for f in files:
tar.add(f)
tar.close()
map_reduce(os.listdir('.'),
lambda x: (os.path.splitext(x)[0], x),
make_tar)
Edit : If you want to group files in different ways, you just need to modify the second argument to map_reduce
. 编辑 :如果要以不同方式对文件进行分组,只需要修改
map_reduce
的第二个参数即可。 The code above groups files that have the same value for the expression os.path.splitext(x)[0]
. 上面的代码对表达式
os.path.splitext(x)[0]
具有相同值的文件进行分组。 So to group by the base file name with all the extensions stripped off, you could replace that expression with strip_all_ext(x)
and add: 因此,要根据基本文件名对所有扩展名进行分组,可以使用
strip_all_ext(x)
替换该表达式并添加:
def strip_all_ext(path):
head, tail = os.path.split(path)
basename = tail.split(os.extsep)[0]
return os.path.join(head, basename)
You have to problems. 你有问题。 Solve the separately.
单独解决。
Finding matching names. 查找匹配的名称。 Use a
collections.defaultict
使用
collections.defaultict
Creating tar files after you find the matching names. 找到匹配的名称后创建tar文件。 You've got that pretty well covered.
你已经很好地覆盖了它。
So. 所以。 Solve problem 1 first.
首先解决问题1。
Use glob
to get all the names. 使用
glob
获取所有名称。 Use os.path.basename
to split the path and basename. 使用
os.path.basename
拆分路径和基本名称。 Use os.path.splitext
to split the name and extension. 使用
os.path.splitext
分割名称和扩展名。
A dictionary of lists can be used to save all files that have the same name. 列表字典可用于保存具有相同名称的所有文件。
Is that what you're doing in part 1? 那是你在第1部分做的吗?
Part 2 is putting the files into tar archives. 第2部分将文件放入tar档案中。 For that, you've got most of the code you need.
为此,您已获得所需的大部分代码。
尝试使用glob模块: http : //docs.python.org/library/glob.html
#! /usr/bin/env python
import os
import tarfile
tarfiles = {}
for f in os.listdir ('files'):
prefix = f [:f.rfind ('.') ]
if prefix in tarfiles: tarfiles [prefix] += [f]
else: tarfiles [prefix] = [f]
for k, v in tarfiles.items ():
tf = tarfile.open ('%s.tar.gz' % k, 'w:gz')
for f in v: tf.addfile (tarfile.TarInfo (f), file ('files/%s' % f) )
tf.close ()
import os
import tarfile
allfiles = {}
for filename in os.listdir("."):
basename = '.'.join (filename.split(".")[:-1] )
if not basename in all_files:
allfiles[basename] = [filename]
else:
allfiles[basename].append(filename)
for basename, filenames in allfiles.items():
if len(filenames) < 2:
continue
tardata = tarfile.open(basename+".tar", "w")
for filename in filenames:
tardata.add(filename)
tardata.close()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.