[英]How to extract a number of tar.gz files to a directory?
Im trying to extract a number of tar.gz files with no success. 我试图提取一些tar.gz文件但没有成功。
Ive tried to modify a code I was using to extract zip files. 我试图修改我用来提取zip文件的代码。 Below is my file structure, files and some code.
下面是我的文件结构,文件和一些代码。
File Structure: 文件结构:
D:\\Test\\Tar
File Names: 文件名:
DZB1212-500258L004001_4.tgz
DZB1213-500119L002001_2.tgz
DZB1213-500119L006001_6.tgz
Code I've tried: 代码我尝试过:
import glob
import os
import re
import tarfile
import gzip
import shutil
os.chdir('E:\\SPRING2019\\SILKROAD\\Folder_Extraction_Auto\\SRTM_DEMs\\TESTEXTRACTER3\\USGS_Declassified\\Declass2_2002')
#set up pathing
tarfile_rootdir = ('E:\\SPRING2019\\SILKROAD\\Folder_Extraction_Auto\\SRTM_DEMs\\TESTEXTRACTER3\\USGS_Declassified\\Declass2_2002')
extract_rootdir = ('E:\\SPRING2019\\SILKROAD\\Folder_Extraction_Auto\\TEST')
#process the zip files [a-zA-Z] to [\w] and removed the _ seperating the two WORKED!!!!!!!!!!!!
re_pattern = re.compile(r'\A([\w+]*)')
#CHANGED ABOVE CREATED HTO_O with no subfolers but all extracted
for tar_file in glob.iglob(os.path.join(tarfile_rootdir, '*.tar.gz')):
part = re.findall(re_pattern, os.path.basename(tar_file))[0]
part = [item.upper() for item in part]
folder = {'outer': '{0}{1}{2}{3}'.format(*part), 'inner': '{0}{1}{2}{3}'.format(*part)}
extract_path = os.path.join(extract_rootdir, folder['outer'])
with tarfile.open(tar_file, 'r:gz') as tarfile:
tar_file.extractall(extract_path)
It will run, but nothing happens. 它会运行,但没有任何反应。
import glob, os, re, tarfile
# Setup main paths.
tarfile_rootdir = r'D:\SPRING2019\Tarfiles'
extract_rootdir = r'D:\SPRING2019\Test'
# Process the files.
re_pattern = re.compile(r'\A(\w+)-\d+[a-zA-Z]0{0,5}(\d+)')
for tar_file in glob.iglob(os.path.join(tarfile_rootdir, '*.tgz')):
# Get the parts from the base tgz filename using regular expressions.
part = re.findall(re_pattern, os.path.basename(tar_file))[0]
# Build the extraction path from each part.
extract_path = os.path.join(extract_rootdir, *part)
# Perform the extract of all files from the zipfile.
with tarfile.open(tar_file, 'r:gz') as r:
r.extractall(extract_path)
This code is based similar to the answer to your last question. 此代码与您上一个问题的答案类似。 Due to uncertain information on directory structure, I will provide a structure as an example.
由于目录结构信息不确定,我将以结构为例。
TGZ files in D:\\SPRING2019\\Tarfiles
: D:\\SPRING2019\\Tarfiles
TGZ文件:
DZB1216-500058L002001.tgz DZB1216-500058L003001.tgz
Extract directory structure in D:\\SPRING2019\\Test
: 在
D:\\SPRING2019\\Test
提取目录结构:
DZB1216 2001 3001
The .tgz
file paths are retrieved with glob
. 使用
glob
检索.tgz
文件路径。
From example filename: DZB1216-500058L002001.tgz
, the regular expression will capture 2 groups: 从示例文件名:
DZB1216-500058L002001.tgz
,正则表达式将捕获2组:
\\A
is an anchor at the start of the string. \\A
是字符串开头的锚点。 (\\w+)
to match DZB1216
. (\\w+)
匹配DZB1216
。 -\\d+[a-zA-Z]0{0,5}
matches up to the next group. -\\d+[a-zA-Z]0{0,5}
匹配下一组。 (\\d+)
to match 2001
. (\\d+)
匹配2001
。 The extraction path is joined using the values of extract_rootdir
, DZB1216
, and 2001
. 使用
extract_rootdir
, DZB1216
和2001
的值连接提取路径。 This results in D:\\SPRING2019\\Test\\DZB1216\\2001
as the extraction path. 这导致
D:\\SPRING2019\\Test\\DZB1216\\2001
作为提取路径。
The use of tarfile
will extract all from the .tgz
file. tarfile
的使用将从.tgz
文件中提取所有内容。
看起来你的文件名是* .tgz,但你的glob是* .tar.gz!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.