简体   繁体   中英

Extract substrings using regex from filenames given by os.walk

I'm basically trying to grab 3 pieces of information from this os.walk

  1. Is there a folder with the name unit in it? If so, I want to know the contents of the folder.
  2. Within those contents, is there a folder name with the format: \d\d\d\d\d\d_DAY\d\d ? If so, I want to extract the first (\d\d\d\d\d\d) and save it as date .
  3. Further within that folder tree, are there MXF files? If so, move the contents of the previous folder to: 'Users/davealterman/Desktop/Volumes/HOW_TO_OCM/RAID OCM/FS4/' + 'DATE'

I am new to coding and this has been a headache. Any help would be appreciated, I know this code doesn't make sense but I'm a bit frustrated


import os, glob, re, shutil 
from pathlib import Path

FS5_path = 'Users/davealterman/Desktop/Volumes/HOW_TO_OCM/RAID OCM/FS4'

home_path = '/Users/davealterman/Desktop/Volumes/HOW_TO_OCM/_FROM PRODUCTION'

os.chdir(home_path)

subList = []
i = -1
for dirs, subs, files in os.walk(home_path):

    for sub in subs:
        print(sub)
        subList.append(sub)
        i + 1
        formatRegex = re.compile(r'(\d{6})(_DAY)(\d{2})')
        mo = formatRegex.search(sub)
        mo.group()

Give this one a shot.

import os, glob, re, shutil 
from pathlib import Path

FS5_path = 'Users/davealterman/Desktop/Volumes/HOW_TO_OCM/RAID OCM/FS4'

home_path = '/Users/davealterman/Desktop/Volumes/HOW_TO_OCM/_FROM PRODUCTION'

os.chdir(home_path)

subList = []
i = -1
for dirs, subs, files in os.walk(home_path):
    # Is there a folder with the name unit in it? If so, I want to know the contents of the folder.
    
    # filter folders containing `unit`
    searching_for = 'unit'
    matched_folders = filter(lambda folder_name: searching_for in folder_name, subs)    
    for folder in matched_folders:
        print(
            os.listdir(
                os.path.join(home_path, folder)
            )
        )
    
    # Within those contents, is there a folder name with the format: \d\d\d\d\d\d_DAY\d\d? If so, I want to extract the first (\d\d\d\d\d\d) and save it as date.
    date_regex = re.compile(r'(\d{5})_DAY\d{2}')

    folders_matching_regex = filter(lambda file: date_regex.fullmatch(file), subs)
    dates = [date_regex.match(folder)[0] for folder in folders_matching_regex]
    date = dates[0]
    mxf_regex = re.compile(r'.*\.pdf')
    mxf_files = filter(lambda file: mxf_regex.fullmatch(file), files)
    for file in mxf_files:
        dest_dir = FS5_path + date + file
        shutil.move(file, dest_dir)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM