简体   繁体   English

如何创建多个带有名称的文件夹,并使用python将多个拉链提取到每个不同的文件夹?

[英]How to create multiple folders with names, and extract multiple zips to each different folder, with python?

I'm having trouble creating many different directories for a number of different zip folders containing different raster data and then extracting all the zips to the new folders in a clean script. 我在为包含不同栅格数据的许多不同zip文件夹创建许多不同目录时遇到问题,然后在干净的脚本中将所有拉链提取到新文件夹。

I have accomplished my task by my code is very long and messy. 我完成了我的任务,我的代码很长很乱。 I need to have folders that are labeled like NE34_E , NE35_E etc, and then within these directories, I need subfolders such as N34_24 , N34_25 etc. which the raster data will be extracted to. 我需要有标记为NE34_ENE35_E等的文件夹,然后在这些目录中,我需要子文件夹,如N34_24N34_25等,栅格数据将被提取到。 I have over 100 zip files that need to be extracted and placed in subfolders. 我有100多个zip文件需要提取并放在子文件夹中。

After making some changes to the way I was making directories this is a sample of my script. 在对我制作目录的方式进行一些更改之后,这是我的脚本示例。

My file structure goes like this: 我的文件结构如下:

 N\\\\N36_E\\\\N36_24 N\\\\N36_E\\\\N35_25 ... etc. 

Zipfile names: Zipfile名称:

 n36_e024_1arc_v3_bil.zip n36_e025_1arc_v3_bil.zip n36_e026_1arc_v3_bil.zip ... etc. 

Python code to create the directory structure: 用于创建目录结构的Python代码:

import os

#Create Sub directories for "NE36_"
pathname1 = "NE36_"
pathname2 = 24
directory = "D:\\Capstone\\Test\\N36_E\\" + str(pathname1) + str(pathname2)
while pathname2 < 46:
    if not os.path.exists(directory):
        os.makedirs(directory)
    pathname2 += 1
    directory = "D:\\Capstone\\Test\\N36_E\\" + str(pathname1) + str(pathname2)

#Create Sub directories for "NE37_"
pathname1 = "NE37_"
pathname2 = 24
directory = "D:\\Capstone\\Test\\N37_E\\" + str(pathname1) + str(pathname2)
while pathname2 < 46:
    if not os.path.exists(directory):
        os.makedirs(directory)
    pathname2 += 1
    directory = "D:\\Capstone\\Test\\N37_E\\" + str(pathname1) + str(pathname2)
import glob, os, re, zipfile

# Setup main paths.
zipfile_rootdir = r'D:\Capstone\Zipfiles'
extract_rootdir = r'D:\Capstone\Test'

# Process the zip files.
re_pattern = re.compile(r'\A([a-zA-Z])(\d+)_([a-zA-Z])0{0,2}(\d+)')

for zip_file in glob.iglob(os.path.join(zipfile_rootdir, '*.zip')):

    # Get the parts from the base zip filename using regular expressions.
    part = re.findall(re_pattern, os.path.basename(zip_file))[0]

    # Make all items in part uppercase using a list comprehension.
    part = [item.upper() for item in part]

    # Create a dict of the parts to make useful parts to be used for folder names.
    # E.g. from ['N', '36', 'E', '24']
    folder = {'outer': '{0}{1}_{2}'.format(*part),
              'inner': '{0}{2}{1}_{3}'.format(*part)}

    # Build the extraction path from each part.
    extract_path = os.path.join(extract_rootdir, folder['outer'], folder['inner'])

    # Perform the extract of all files from the zipfile.
    with zipfile.ZipFile(zip_file, 'r') as zip:
        zip.extractall(extract_path)

2 main settings to set values, which is: 设置值的2个主要设置,即:

  1. zipfile_rootdir is where the zip file are located. zipfile_rootdir是zip文件所在的位置。
  2. extract_rootdir is where to extract to. extract_rootdir是提取到的地方。

The r before the string is treat as raw string, so backslash escaping is not needed. 字符串之前的r被视为原始字符串,因此不需要反斜杠转义。

A regular expression is compiled and used to extract the text from the zip file names used for the extraction path. 编译正则表达式并用于从用于提取路径的zip文件名中提取文本。

From zip file: 来自zip文件:

 n36_e024_1arc_v3_bil.zip 

extracts a part sequence with use of a regular expression: 使用正则表达式提取部分序列:

 n, 36, e, 24 

Each item is uppercased and used to create a dictionary named folders containing keys and values: 每个项目都是大写的,用于创建名为包含键和值的folders的字典:

 'outer': 'N36_E' 'inner': 'NE36_24' 

extract_path will store the full path by joining extract_rootdir with folder['outer'] and folder['inner'] . extract_path将通过将extract_rootdirfolder['outer']folder['inner']连接来存储完整路径。

Finally, using a Context Manager by use of with , the zip files will be extracted. 最后,通过使用with使用Context Manager,将提取zip文件。


Regular Expression: 正则表达式:

re_pattern = re.compile(r'\A([a-zA-Z])(\d+)_([a-zA-Z])0{0,2}(\d+)')

The compile of the regular expression pattern before the loop is to avoid multiple compiles of the pattern in the loop. 在循环之前编译正则表达式模式是为了避免在循环中多次编译模式。 The use of r before the string is to inform Python that that the string should be interpreted as raw ie no backslash escaping. 在字符串之前使用r是为了告知Python该字符串应该被解释为raw,即没有反斜杠转义。 Raw strings are useful for regular expressions as backslash escaping is used for the patterns. 原始字符串对正则表达式很有用,因为反斜杠转义用于模式。

The regular expression pattern: 正则表达式模式:

 \\A([a-zA-Z])(\\d+)_([a-zA-Z])0{0,2}(\\d+) 

The string for the regular expression to work on: 要处理的正则表达式的字符串:

 n36_e024_1arc_v3_bil.zip 
  1. \\A Matches only at the start of the string. \\A仅匹配字符串的开头。 This is an anchor and does not match any character. 这是一个锚点,与任何角色都不匹配。
  2. ([a-zA-Z]) Matches any alphabet character. ([a-zA-Z])匹配任何字母字符。 [] is match any characters within. []匹配任何字符。 Any character between the range of a to z and A to Z is matched. azAZ的范围之间的任何字符都匹配。 n will be matched. n将匹配。 The enclosing () is store that group captured into the returned sequence. enclosing ()存储捕获到返回序列中的组。 So the sequence is now n, . 所以序列现在是n, .
  3. (\\d+) Matches 1 digit or more. (\\d+)匹配1位或更多。 The \\d is any digit and + tells it to keep matching more. \\d是任何数字, +表示保持匹配更多。 Sequence becomes n, 36, . 序列变为n, 36, .
  4. _ is literal and since () is not enclosing it, it is matched though is not added to the sequence. _是文字的,因为()没有包含它,所以匹配虽然未添加到序列中。
  5. ([a-zA-Z]) Same as point 2. Sequence becomes n, 36, e, . ([a-zA-Z])与点2相同。序列变为n, 36, e, ....
  6. 0{0,2} Match a zero 0 , zero to 2 times {0,2} . 0{0,2}匹配零0到2倍{0,2} No () , so not added to the sequence. 没有() ,所以没有添加到序列中。
  7. (\\d+) Same as point 3. Sequence becomes n, 36, e, 24 . (\\d+)与点3相同。序列变为n, 36, e, 24
  8. The rest of the string is ignored as the pattern has reached it's end. 当模式到达它结束时,字符串的其余部分将被忽略。 This is why the \\A is used so the pattern cannot start anywhere and proceed to the end of the string that is not wanted. 这就是使用\\A原因,因此模式无法从任何地方开始并继续到不需要的字符串的末尾。

Formatting: 格式:

Sequence is N, 36, E, 24 after being uppercased by the list comprehension. 在列表理解为大写之后N, 36, E, 24序列为N, 36, E, 24

  1. The pattern {0}{1}_{2} is ordered 0, 1, 2 , so 0 is N , 1 is 36 and 2 is E to become N36_E . 图案{0}{1}_{2}是有序的0, 1, 2 ,因此0是N ,1是36和2是E成为N36_E The _ is literal in the pattern. _是模式中的文字。
  2. The pattern {0}{2}{1}_{3} is ordered 0, 2, 1, 3 . 图案{0}{2}{1}_{3}是有序的0, 2, 1, 3 0 is N , 2 is E , 1 is 36 and 3 is 24 to become NE36_24 . 0是N ,2是E ,1是36和3是24成为NE36_24

References: 参考文献:

  • Python 2: Python 2:

    • re module for the regular expressions. re模块用于正则表达式。
    • format method for the formatting of strings. 格式化字符串的格式方法。
    • list comprehensions used to uppercase items in the sequence. list comprehensions用于序列中的大写项。
    • zipfile module for working with zip archives. zipfile模块,用于处理zip存档。
  • Python 3: Python 3:

    • re module for the regular expressions. re模块用于正则表达式。
    • format method for the formatting of strings. 格式化字符串的格式方法。
    • list comprehensions used to uppercase items in the sequence. list comprehensions用于序列中的大写项。
    • zipfile module for working with zip archives. zipfile模块,用于处理zip存档。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用文件的唯一名称创建多个副本并将新副本放入Python中的多个文件夹中? - How to create multiple copies with unique names of a file and put new copies into multiple folders in Python? 如何创建具有不同名称的多个文件并在 Python 中写入它们 - How to create multiple files with different names and write to them in Python Python:将文件从不同位置的多个文件夹移动到一个文件夹中 - Python: Move files from multiple folders in different locations into one folder 如何使用python重命名每个文件夹中以&#39;1&#39;开头的数字重命名文件? - how to rename files in multiple folders numerically starting with '1' each folder using python? 如何在Python中创建多个嵌套文件夹? - How to create multiple nested folders in Python? 我正在尝试使用 python 在同一个文件夹中创建多个具有不同名称的文件 - I am trying to create multiple files in the same folder with different names using python 从多个Zips中提取相同的文件(名称)并将其存储为Zip Name = File Name(Python) - Extract same file (name) from multiple Zips and store it as Zip Name = File Name (Python) Python/Opencv 将多个图像保存到具有不同名称的文件夹中 - Python/Opencv save multiple images to folder with different names 如何从 Python 中的多个文件夹中提取文件 - How to extract files from across multiple folders in Python 使用 Python 从多个文件夹中提取所有文件 - Extract all files from multiple folders with Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM