简体   繁体   中英

RegEx to find specific file path

I am trying to find the existence of a file testing.txt

The first file exists in: sub/hbc_cube/college/

The second file exists in: sub/hbc/college

However, when searching for where the file exists, I CANNOT assume the string 'hbc' because the name may be different depending on the user. So I am trying to find a way to

PASS if the path is

sub/_cube/college/

FAIL if the path is

sub/*/college

But I cannot use a glob character ( ) because the ( ) will count _cube as failing. I am trying to figure out a regular expression that will only detect a string and not a string with an underscore (hbc_cube for example).

I have tried using the python regex dictionary but I have not been able to figure out the correct regex to use

file_list = lookupfiles(['testing.txt'], dirlist = ['sub/'])
for file in file_list:
     if str(file).find('_cube/college/') #hbc_cube/college
            print("pass")
     if str(file).find('*/college/')     #hbc/college
            print("fail")

If the file exists in both locations I want only "fail" to print. The problem is the * character is counting hbc_cube.

The glob module is your friend. You don't even need to match against multiple directories, glob will do it for you:

from glob import glob

testfiles = glob("sub/*/testing.txt")

if len(testfiles) > 0 and all("_cube/" in path for path in testfiles):
    print("Pass")
else:
    print("Fail")

In case it is not obvious, the test all("_cube/" in path for path in testfiles) will take care of this requirement:

If the file exists in both locations I want only "fail" to print. The problem is the * character is counting hbc_cube .

If some of the paths that matched do not contain _cube , the test fails. Since you want to know about files that cause the test to fail , you cannot search solely for files in a path containing *_cube -- you must retrieve both good and bad paths, and inspect them as shown.

Of course you can shorten the above code, or generalize it to construct the globbed path by combining options from a list of folders and a list of files, etc., depending on the particulars of your case.

Note that there are "full regular expressions", provided by the re module, and the simpler "globs" used by the glob module. If you go check the documentation, don't confuse them.

The os module is well suited for this:

import os

# This assumes your current working directory has sub in it
for root, dirs, files in os.walk('sub'):
    for file in files:
        if file=='testing.txt':
            # print the file and the directory it's in
            print(os.path.join(root, file))

os.walk will return a three-element tuple as it iterates: a root dir, directories in that current folder, and files in that current folder. To print the directory, you combine the root (cwd) and the file name.

For example, on my machine:

for root, dirs, files in os.walk(os.getcwd()):
     for file in files:
             if file.endswith('ipynb'):
                     os.path.join(root, file)


# returns
/Users/mm92400/Salesforce_Repos/DataExplorationClustersAndTime.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationUntitled1.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationExploratory.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationUntitled3.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationUntitled.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationUntitled4.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationUntitled2.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationClusterAnalysis.ipynb

Use the pathlib to parse your path, from the path object get the parent, this will discard the /college part, and check if the path string ends with _cube

from pathlib import Path

file_list = lookupfiles(['testing.txt'], dirlist = ['sub/'])
for file in file_list:
     path = Path(file)
     if str(path.parent).endswith('_cube'):
         print('pass')
     else:
         print('Fail')

Edit:

If the file variable in the for loop contains the file name ( sub/_cube/college/testing.txt ) just call parent twice on the path, path.parent.parent

Another approach would be to filter the files inside lookupfiles() that is if you have access to that function and can edit it

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM