正則表達式查找特定文件路徑

Question

我試圖找到文件testing.txt的存在

第一個文件存在於： sub/hbc_cube/college/

第二個文件存在於： sub/hbc/college

但是，在搜索文件所在的位置時，我不能假設字符串“hbc”，因為名稱可能因用戶而異。 所以我試圖找到一種方法

如果路徑為PASS

子/_立方體/學院/

如果路徑為FAIL

子/*/學院

但是我不能使用 glob 字符 ( ) 因為 ( ) 會將 _cube 視為失敗。 我試圖找出一個正則表達式，它只會檢測一個字符串，而不是一個帶下划線的字符串（例如 hbc_cube）。

我曾嘗試使用 python regex 字典，但我無法找出要使用的正確正則表達式

file_list = lookupfiles(['testing.txt'], dirlist = ['sub/'])
for file in file_list:
     if str(file).find('_cube/college/') #hbc_cube/college
            print("pass")
     if str(file).find('*/college/')     #hbc/college
            print("fail")

如果文件在兩個位置都存在，我只想“失敗”打印。 問題是 * 字符正在計算 hbc_cube。

Answer 1

glob模塊是你的朋友。 您甚至不需要匹配多個目錄， glob會為您完成：

from glob import glob

testfiles = glob("sub/*/testing.txt")

if len(testfiles) > 0 and all("_cube/" in path for path in testfiles):
    print("Pass")
else:
    print("Fail")

如果不明顯， test all("_cube/" in path for path in testfiles)將滿足此要求：

如果文件在兩個位置都存在，我只想“失敗”打印。 問題是 * 字符正在計算hbc_cube 。

如果某些匹配的路徑不包含_cube ，則測試失敗。 由於您想了解導致測試失敗的文件，您不能只搜索包含*_cube的路徑中的文件——您必須檢索好的和壞的路徑，並按所示檢查它們。

當然，您可以縮短上述代碼，或者根據您的案例的具體情況，通過組合文件夾列表和文件列表等中的選項來將其概括為構建全局路徑。

請注意， re模塊提供了“完整的正則表達式”，以及glob模塊使用的更簡單的“globs”。 如果你去檢查文檔，不要混淆它們。

Answer 2

os模塊非常適合於此：

import os

# This assumes your current working directory has sub in it
for root, dirs, files in os.walk('sub'):
    for file in files:
        if file=='testing.txt':
            # print the file and the directory it's in
            print(os.path.join(root, file))

os.walk將在迭代時返回一個三元素元組：根目錄、當前文件夾中的目錄和當前文件夾中的文件。 要打印目錄，請組合根 (cwd) 和文件名。

例如，在我的機器上：

for root, dirs, files in os.walk(os.getcwd()):
     for file in files:
             if file.endswith('ipynb'):
                     os.path.join(root, file)


# returns
/Users/mm92400/Salesforce_Repos/DataExplorationClustersAndTime.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationUntitled1.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationExploratory.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationUntitled3.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationUntitled.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationUntitled4.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationUntitled2.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationClusterAnalysis.ipynb

Answer 3

使用pathlib解析你的路徑，從路徑對象獲取父級，這將丟棄/college部分，並檢查路徑字符串是否以_cube

from pathlib import Path

file_list = lookupfiles(['testing.txt'], dirlist = ['sub/'])
for file in file_list:
     path = Path(file)
     if str(path.parent).endswith('_cube'):
         print('pass')
     else:
         print('Fail')

編輯：

如果 for 循環中的file變量包含文件名（ sub/_cube/college/testing.txt ），只需在路徑上調用 parent 兩次， path.parent.parent

另一種方法是過濾lookupfiles()中的文件， lookupfiles()是您有權訪問該函數並可以對其進行編輯

正則表達式查找特定文件路徑

問題描述

3 個解決方案

解決方案1
1 2019-03-28 19:21:49

解決方案2
0 2019-03-28 19:14:49

解決方案3
0 2019-03-28 19:15:39

正則表達式查找特定文件路徑

問題描述

3 個解決方案

解決方案1 1 2019-03-28 19:21:49

解決方案2 0 2019-03-28 19:14:49

解決方案3 0 2019-03-28 19:15:39

解決方案1
1 2019-03-28 19:21:49

解決方案2
0 2019-03-28 19:14:49

解決方案3
0 2019-03-28 19:15:39