如何搜索目錄並找到與正則表達式匹配的文件？

Question

我最近開始使用 Python，但我很難根據我創建的正則表達式搜索目錄和匹配文件。

基本上我希望它掃描另一個目錄中的所有目錄並找到所有以.zip或.rar或.r01結尾的文件，然后根據它是什么文件運行各種命令。

import os, re

rootdir = "/mnt/externa/Torrents/completed"

for subdir, dirs, files in os.walk(rootdir):
    if re.search('(w?.zip)|(w?.rar)|(w?.r01)', files):
        print "match: " . files

Answer 1

import os
import re

rootdir = "/mnt/externa/Torrents/completed"
regex = re.compile('(.*zip$)|(.*rar$)|(.*r01$)')

for root, dirs, files in os.walk(rootdir):
  for file in files:
    if regex.match(file):
       print(file)

代碼波紋管在以下評論中回答問題

效果非常好，如果在正則表達式組 1 上找到匹配，有沒有辦法做到這一點，如果在正則表達式組 2 上找到匹配，是否有辦法做到這一點？ – 尼爾尼爾森

import os
import re

regex = re.compile('(.*zip$)|(.*rar$)|(.*r01$)')
rx = '(.*zip$)|(.*rar$)|(.*r01$)'

for root, dirs, files in os.walk("../Documents"):
  for file in files:
    res = re.match(rx, file)
    if res:
      if res.group(1):
        print("ZIP",file)
      if res.group(2):
        print("RAR",file)
      if res.group(3):
        print("R01",file)

有可能以更好的方式做到這一點，但這是有效的。

Answer 2

鑒於您是初學者，我建議使用glob代替快速編寫的 file-walking-regex 匹配器。

使用`glob`和`file-walking-regex matcher`的函數片段

下面的代碼片段包含兩個文件正則表達式搜索函數（一個使用glob ，另一個使用自定義 file-walking-regex 匹配器）。 該代碼段還包含一個“秒表”功能來為這兩個功能計時。

import os
import sys
from datetime import timedelta
from timeit import time
import os
import re
import glob

def stopwatch(method):
    def timed(*args, **kw):
        ts = time.perf_counter()
        result = method(*args, **kw)
        te = time.perf_counter()
        duration = timedelta(seconds=te - ts)
        print(f"{method.__name__}: {duration}")
        return result
    return timed

@stopwatch
def get_filepaths_with_oswalk(root_path: str, file_regex: str):
    files_paths = []
    pattern = re.compile(file_regex)
    for root, directories, files in os.walk(root_path):
        for file in files:
            if pattern.match(file):
                files_paths.append(os.path.join(root, file))
    return files_paths


@stopwatch
def get_filepaths_with_glob(root_path: str, file_regex: str):
    return glob.glob(os.path.join(root_path, file_regex))

比較上述函數的運行時間

使用上述兩個函數在名為root_path的目錄（包含 66,948 個文件）中查找與正則表達式filename_*.csv匹配的 5076 個文件：

>>> glob_files = get_filepaths_with_glob(root_path, 'filename_*.csv')
get_filepaths_with_glob: 0:00:00.176400

>>> oswalk_files = get_filepaths_with_oswalk(root_path,'filename_(.*).csv')
get_filepaths_with_oswalk: 0:03:29.385379

glob方法要快得多，它的代碼也更短。

對於您的情況

對於您的情況，您可能可以使用以下內容來獲取*.zip 、 *.rar和*.r01文件：

files = []
for ext in ['*.zip', '*.rar', '*.r01']:
    files += get_filepaths_with_glob(root_path, ext)

Answer 3

這是使用glob的替代方法。

from pathlib import Path

rootdir = "/mnt/externa/Torrents/completed"
for extension in 'zip rar r01'.split():
    for path in Path(rootdir).glob('*.' + extension):
        print("match: " + path)

Answer 4

我會這樣做：

import re
from pathlib import Path

def glob_re(path, regex="", glob_mask="**/*", inverse=False):
    p = Path(path)
    if inverse:
        res = [str(f) for f in p.glob(glob_mask) if not re.search(regex, str(f))]
    else:
        res = [str(f) for f in p.glob(glob_mask) if re.search(regex, str(f))]
    return res

注意：默認情況下，它將遞歸掃描所有子目錄。 如果只想掃描當前目錄，則應明確指定glob_mask="*"

如何搜索目錄並找到與正則表達式匹配的文件？

問題描述

4 個解決方案

解決方案1
16 已采納 2016-09-02 13:55:10

解決方案2
6 2020-05-25 04:00:16

使用`glob`和`file-walking-regex matcher`的函數片段

比較上述函數的運行時間

對於您的情況

解決方案3
5 2020-01-29 16:18:42

解決方案4
1 2020-12-09 12:07:32

如何搜索目錄並找到與正則表達式匹配的文件？

問題描述

4 個解決方案

解決方案1 16 已采納 2016-09-02 13:55:10

解決方案2 6 2020-05-25 04:00:16

使用glob和file-walking-regex matcher的函數片段

比較上述函數的運行時間

對於您的情況

解決方案3 5 2020-01-29 16:18:42

解決方案4 1 2020-12-09 12:07:32

解決方案1
16 已采納 2016-09-02 13:55:10

解決方案2
6 2020-05-25 04:00:16

使用`glob`和`file-walking-regex matcher`的函數片段

解決方案3
5 2020-01-29 16:18:42

解決方案4
1 2020-12-09 12:07:32