简体   繁体   English

使用正则表达式从 os.walk 给出的文件名中提取子字符串

[英]Extract substrings using regex from filenames given by os.walk

I'm basically trying to grab 3 pieces of information from this os.walk我基本上是想从这个 os.walk 中获取 3 条信息

  1. Is there a folder with the name unit in it?里面有没有名字单位的文件夹? If so, I want to know the contents of the folder.如果是这样,我想知道文件夹的内容。
  2. Within those contents, is there a folder name with the format: \d\d\d\d\d\d_DAY\d\d ?在这些内容中,是否有格式为: \d\d\d\d\d\d_DAY\d\d的文件夹名称? If so, I want to extract the first (\d\d\d\d\d\d) and save it as date .如果是这样,我想提取第一个(\d\d\d\d\d\d)并将其保存为date
  3. Further within that folder tree, are there MXF files?在该文件夹树的更深处,是否有 MXF 文件? If so, move the contents of the previous folder to: 'Users/davealterman/Desktop/Volumes/HOW_TO_OCM/RAID OCM/FS4/' + 'DATE'如果是这样,请将上一个文件夹的内容移动到: 'Users/davealterman/Desktop/Volumes/HOW_TO_OCM/RAID OCM/FS4/' + 'DATE'

I am new to coding and this has been a headache.我是编码新手,这让我很头疼。 Any help would be appreciated, I know this code doesn't make sense but I'm a bit frustrated任何帮助将不胜感激,我知道这段代码没有意义,但我有点沮丧


import os, glob, re, shutil 
from pathlib import Path

FS5_path = 'Users/davealterman/Desktop/Volumes/HOW_TO_OCM/RAID OCM/FS4'

home_path = '/Users/davealterman/Desktop/Volumes/HOW_TO_OCM/_FROM PRODUCTION'

os.chdir(home_path)

subList = []
i = -1
for dirs, subs, files in os.walk(home_path):

    for sub in subs:
        print(sub)
        subList.append(sub)
        i + 1
        formatRegex = re.compile(r'(\d{6})(_DAY)(\d{2})')
        mo = formatRegex.search(sub)
        mo.group()

Give this one a shot.试一试。

import os, glob, re, shutil 
from pathlib import Path

FS5_path = 'Users/davealterman/Desktop/Volumes/HOW_TO_OCM/RAID OCM/FS4'

home_path = '/Users/davealterman/Desktop/Volumes/HOW_TO_OCM/_FROM PRODUCTION'

os.chdir(home_path)

subList = []
i = -1
for dirs, subs, files in os.walk(home_path):
    # Is there a folder with the name unit in it? If so, I want to know the contents of the folder.
    
    # filter folders containing `unit`
    searching_for = 'unit'
    matched_folders = filter(lambda folder_name: searching_for in folder_name, subs)    
    for folder in matched_folders:
        print(
            os.listdir(
                os.path.join(home_path, folder)
            )
        )
    
    # Within those contents, is there a folder name with the format: \d\d\d\d\d\d_DAY\d\d? If so, I want to extract the first (\d\d\d\d\d\d) and save it as date.
    date_regex = re.compile(r'(\d{5})_DAY\d{2}')

    folders_matching_regex = filter(lambda file: date_regex.fullmatch(file), subs)
    dates = [date_regex.match(folder)[0] for folder in folders_matching_regex]
    date = dates[0]
    mxf_regex = re.compile(r'.*\.pdf')
    mxf_files = filter(lambda file: mxf_regex.fullmatch(file), files)
    for file in mxf_files:
        dest_dir = FS5_path + date + file
        shutil.move(file, dest_dir)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM