如何创建多个带有名称的文件夹，并使用python将多个拉链提取到每个不同的文件夹？

Question

I'm having trouble creating many different directories for a number of different zip folders containing different raster data and then extracting all the zips to the new folders in a clean script. 我在为包含不同栅格数据的许多不同zip文件夹创建许多不同目录时遇到问题，然后在干净的脚本中将所有拉链提取到新文件夹。

I have accomplished my task by my code is very long and messy. 我完成了我的任务，我的代码很长很乱。 I need to have folders that are labeled like NE34_E , NE35_E etc, and then within these directories, I need subfolders such as N34_24 , N34_25 etc. which the raster data will be extracted to. 我需要有标记为NE34_E ， NE35_E等的文件夹，然后在这些目录中，我需要子文件夹，如N34_24 ， N34_25等，栅格数据将被提取到。 I have over 100 zip files that need to be extracted and placed in subfolders. 我有100多个zip文件需要提取并放在子文件夹中。

After making some changes to the way I was making directories this is a sample of my script. 在对我制作目录的方式进行一些更改之后，这是我的脚本示例。

My file structure goes like this: 我的文件结构如下：

 N\\\\N36_E\\\\N36_24 N\\\\N36_E\\\\N35_25 ... etc.

Zipfile names: Zipfile名称：

 n36_e024_1arc_v3_bil.zip n36_e025_1arc_v3_bil.zip n36_e026_1arc_v3_bil.zip ... etc.

Python code to create the directory structure: 用于创建目录结构的Python代码：

import os

#Create Sub directories for "NE36_"
pathname1 = "NE36_"
pathname2 = 24
directory = "D:\\Capstone\\Test\\N36_E\\" + str(pathname1) + str(pathname2)
while pathname2 < 46:
    if not os.path.exists(directory):
        os.makedirs(directory)
    pathname2 += 1
    directory = "D:\\Capstone\\Test\\N36_E\\" + str(pathname1) + str(pathname2)

#Create Sub directories for "NE37_"
pathname1 = "NE37_"
pathname2 = 24
directory = "D:\\Capstone\\Test\\N37_E\\" + str(pathname1) + str(pathname2)
while pathname2 < 46:
    if not os.path.exists(directory):
        os.makedirs(directory)
    pathname2 += 1
    directory = "D:\\Capstone\\Test\\N37_E\\" + str(pathname1) + str(pathname2)

Answer 1

import glob, os, re, zipfile

# Setup main paths.
zipfile_rootdir = r'D:\Capstone\Zipfiles'
extract_rootdir = r'D:\Capstone\Test'

# Process the zip files.
re_pattern = re.compile(r'\A([a-zA-Z])(\d+)_([a-zA-Z])0{0,2}(\d+)')

for zip_file in glob.iglob(os.path.join(zipfile_rootdir, '*.zip')):

    # Get the parts from the base zip filename using regular expressions.
    part = re.findall(re_pattern, os.path.basename(zip_file))[0]

    # Make all items in part uppercase using a list comprehension.
    part = [item.upper() for item in part]

    # Create a dict of the parts to make useful parts to be used for folder names.
    # E.g. from ['N', '36', 'E', '24']
    folder = {'outer': '{0}{1}_{2}'.format(*part),
              'inner': '{0}{2}{1}_{3}'.format(*part)}

    # Build the extraction path from each part.
    extract_path = os.path.join(extract_rootdir, folder['outer'], folder['inner'])

    # Perform the extract of all files from the zipfile.
    with zipfile.ZipFile(zip_file, 'r') as zip:
        zip.extractall(extract_path)

2 main settings to set values, which is: 设置值的2个主要设置，即：

zipfile_rootdir is where the zip file are located. zipfile_rootdir是zip文件所在的位置。
extract_rootdir is where to extract to. extract_rootdir是提取到的地方。

The r before the string is treat as raw string, so backslash escaping is not needed. 字符串之前的r被视为原始字符串，因此不需要反斜杠转义。

A regular expression is compiled and used to extract the text from the zip file names used for the extraction path. 编译正则表达式并用于从用于提取路径的zip文件名中提取文本。

From zip file: 来自zip文件：

 n36_e024_1arc_v3_bil.zip

extracts a part sequence with use of a regular expression: 使用正则表达式提取部分序列：

 n, 36, e, 24

Each item is uppercased and used to create a dictionary named folders containing keys and values: 每个项目都是大写的，用于创建名为包含键和值的folders的字典：

 'outer': 'N36_E' 'inner': 'NE36_24'

extract_path will store the full path by joining extract_rootdir with folder['outer'] and folder['inner'] . extract_path将通过将extract_rootdir与folder['outer']和folder['inner']连接来存储完整路径。

Finally, using a Context Manager by use of with , the zip files will be extracted. 最后，通过使用with使用Context Manager，将提取zip文件。

Regular Expression: 正则表达式：

re_pattern = re.compile(r'\A([a-zA-Z])(\d+)_([a-zA-Z])0{0,2}(\d+)')

The compile of the regular expression pattern before the loop is to avoid multiple compiles of the pattern in the loop. 在循环之前编译正则表达式模式是为了避免在循环中多次编译模式。 The use of r before the string is to inform Python that that the string should be interpreted as raw ie no backslash escaping. 在字符串之前使用r是为了告知Python该字符串应该被解释为raw，即没有反斜杠转义。 Raw strings are useful for regular expressions as backslash escaping is used for the patterns. 原始字符串对正则表达式很有用，因为反斜杠转义用于模式。

The regular expression pattern: 正则表达式模式：

 \\A([a-zA-Z])(\\d+)_([a-zA-Z])0{0,2}(\\d+)

The string for the regular expression to work on: 要处理的正则表达式的字符串：

 n36_e024_1arc_v3_bil.zip

\\A Matches only at the start of the string. \\A仅匹配字符串的开头。 This is an anchor and does not match any character. 这是一个锚点，与任何角色都不匹配。
([a-zA-Z]) Matches any alphabet character. ([a-zA-Z])匹配任何字母字符。 [] is match any characters within. []匹配任何字符。 Any character between the range of a to z and A to Z is matched. a到z和A到Z的范围之间的任何字符都匹配。 n will be matched. n将匹配。 The enclosing () is store that group captured into the returned sequence. enclosing ()存储捕获到返回序列中的组。 So the sequence is now n, . 所以序列现在是n, .
(\\d+) Matches 1 digit or more. (\\d+)匹配1位或更多。 The \\d is any digit and + tells it to keep matching more. \\d是任何数字， +表示保持匹配更多。 Sequence becomes n, 36, . 序列变为n, 36, .
_ is literal and since () is not enclosing it, it is matched though is not added to the sequence. _是文字的，因为()没有包含它，所以匹配虽然未添加到序列中。
([a-zA-Z]) Same as point 2. Sequence becomes n, 36, e, . ([a-zA-Z])与点2相同。序列变为n, 36, e, ....
0{0,2} Match a zero 0 , zero to 2 times {0,2} . 0{0,2}匹配零0到2倍{0,2} 。 No () , so not added to the sequence. 没有() ，所以没有添加到序列中。
(\\d+) Same as point 3. Sequence becomes n, 36, e, 24 . (\\d+)与点3相同。序列变为n, 36, e, 24 。
The rest of the string is ignored as the pattern has reached it's end. 当模式到达它结束时，字符串的其余部分将被忽略。 This is why the \\A is used so the pattern cannot start anywhere and proceed to the end of the string that is not wanted. 这就是使用\\A原因，因此模式无法从任何地方开始并继续到不需要的字符串的末尾。

Formatting: 格式：

Sequence is N, 36, E, 24 after being uppercased by the list comprehension. 在列表理解为大写之后N, 36, E, 24序列为N, 36, E, 24 。

The pattern {0}{1}_{2} is ordered 0, 1, 2 , so 0 is N , 1 is 36 and 2 is E to become N36_E . 图案{0}{1}_{2}是有序的0, 1, 2 ，因此0是N ，1是36和2是E成为N36_E 。 The _ is literal in the pattern. _是模式中的文字。
The pattern {0}{2}{1}_{3} is ordered 0, 2, 1, 3 . 图案{0}{2}{1}_{3}是有序的0, 2, 1, 3 。 0 is N , 2 is E , 1 is 36 and 3 is 24 to become NE36_24 . 0是N ，2是E ，1是36和3是24成为NE36_24 。

References: 参考文献：

Python 2: Python 2：
- re module for the regular expressions. re模块用于正则表达式。
- format method for the formatting of strings. 格式化字符串的格式方法。
- list comprehensions used to uppercase items in the sequence. list comprehensions用于序列中的大写项。
- zipfile module for working with zip archives. zipfile模块，用于处理zip存档。
Python 3: Python 3：
- re module for the regular expressions. re模块用于正则表达式。
- format method for the formatting of strings. 格式化字符串的格式方法。
- list comprehensions used to uppercase items in the sequence. list comprehensions用于序列中的大写项。
- zipfile module for working with zip archives. zipfile模块，用于处理zip存档。

如何创建多个带有名称的文件夹，并使用python将多个拉链提取到每个不同的文件夹？

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-06-08 10:10:09

Regular Expression: 正则表达式：

Formatting: 格式：

References: 参考文献：

如何创建多个带有名称的文件夹，并使用python将多个拉链提取到每个不同的文件夹？

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-06-08 10:10:09

Regular Expression: 正则表达式：

Formatting: 格式：

References: 参考文献：

解决方案1
1 已采纳 2019-06-08 10:10:09