简体   繁体   中英

A simple python script to 'search text files for whole words' - with GUI

I am currently building a small program that allows searching for phrases in actors' dialog, using transcribed text files from video clips. I run into a few issues as described below...

  1. Create user input:

     # Get the SEARCH WINDOW root = tk.Tk() root.withdraw() root.option_add('*background', '#111111') root.option_add('*Entry*background', '#999999') searchPhrase = sd.askstring( "PhraseFinder v0.1 | filmwerk.nyc 2021 ", "Type keyword, or entire phrase, to search...", parent=root,)>

    This seems to work fine. User input stored in searchPhrase ...

  2. Take user input from above ( searchPhrase ) and search a directory containing 800 text files ('whole word' search only - 'ignore case').

     # Do THE SEARCH, based on user input import glob import os rootDir = '/Volumes/audio/TRANSCRIBE/OUT' os.chdir( rootDir ) for files in glob.glob( "*.txt" ): with open(files) as f: contents = f.read() if (re.search(r'\b'+ re.escape(searchPhrase) + r'\b', contents, re.IGNORECASE)): print( f )

    This outputs:

     <_io.TextIOWrapper name='FW_A01_2020-12-01_1856_C0004.txt' mode='r' encoding='US-ASCII'> <_io.TextIOWrapper name='FW_A01_2020-12-01_1900_C0007.txt' mode='r' encoding='US-ASCII'>

The search result is correct, but the output format is not what I expected. So I need to rename stuff here. Unless there's a better way to get (print) the results? Currently, this gets output by print( f ) .

The only thing I need from this output is to grab the actual file name:
FW_A01_2020-12-01_1856_C0004.txt and FW_A01_2020-12-01_1900_C0007.txt .
Then I need to rename & add the full path and finally store those search results files (clip list) in a continuous list, formatted like this:

> '/Volumes/RAID/Data/Media/TWO_CHAIRS/footage/FW_A01_2020-12-01_1806_C0001/FW_A01_2020-12-01_1806_C0001_000000.dng', '/Volumes/RAID/Data/Media/TWO_CHAIRS/footage/FW_A01_2020-12-01_1806_C0001/FW_A01_2020-12-01_1806_C0001_000000.dng',
  1. Rename the 'search result' filenames (and add the full path), then store them in a variable. Since I don't know (yet) how to pipe in my actual search results into this function, I'll get the rootDir instead to perform the 'rename' as a test.

     for currentFile in listofFiles: listofFiles = listdir(rootDir) for currentFile in listofFiles: sourceFile = rootDir + "/" + currentFile mainNameEnd = currentFile.find('.') newFileName = currentFile[:mainNameEnd] + '_000000.dng' dirLoc = currentFile[:mainNameEnd] fullPathName = "'" + mediaDir + project.GetName() + "/" + "footage" + "/" + dirLoc + "/" + newFileName + "'" + "," + " " print("Converting path name: " + fullPathName)

This outputs:

Converting path name: '/Volumes/RAID/Data/Media/TWO_CHAIRS/footage/FW_A01_2020-12-01_1806_C0001/FW_A01_2020-12-01_1806_C0001_000000.dng',
Converting path name: '/Volumes/RAID/Data/Media/TWO_CHAIRS/footage/FW_A01_2020-12-01_1812_C0003/FW_A01_2020-12-01_1812_C0003_000000.dng',
Converting path name: '/Volumes/RAID/Data/Media/TWO_CHAIRS/footage/FW_A01_2020-12-01_1856_C0004/FW_A01_2020-12-01_1856_C0004_000000.dng',

Great, exactly the output format I need. However, this only works with files found in rootDir . What I really need is to grab the 'search result' clip list and rename those files the same way. Also, the clip list needs to be a continuous line as shown earlier.

Once that's working I'll use the reformated clip list in the function below. This will then import the clips into an external app.

# Import clips from Search Result
# We insert the search_result_clip_list, separated by comma. 
clips = resolve.GetMediaStorage().AddItemsToMediaPool(search_result_clip_list)  # <-- clip list goes here 
print(search_result_clip_list)

In a nutshell, I can't figure out how to take my search results, create a list, and finally use that list in the function above.

Would someone know how to implement this properly?

python 3.6.8 | MacOS 10.13.2 | Davinci Resolve 15

To get all the names in the same list:

You can use an empty list and add items to it in each loop like this:

my_names_list = []
for currentFile in listofFiles:
    sourceFile = rootDir + "/" + currentFile
    mainNameEnd = currentFile.find('.')
    newFileName = currentFile[:mainNameEnd] + '_000000.dng'
    dirLoc = currentFile[:mainNameEnd]
    fullPathName = "'" + mediaDir + project.GetName() + "/" + "footage" + "/" + dirLoc + "/" + newFileName + "'" + "," + " "
    print("Converting path name: " + fullPathName)
    my_names_list.append(fullPathName)

You will get a list with all the names as its items. Respect of this: However, this only works with files found in rootDir I don't really get what you want, try to be more specific.

Real file name is in variable files and you should simply use

print(files)

In f you have file-object which reads data from file - not file name - and eventually you could use

print( f.name )

but I would prefer first version.


EDIT:

If you want to keep all filenames which match regex then you should use list.

Before loop create searchResult = [] and inside loop use searchResult.append( files )

searchResult = []

for files in glob.glob( "*.txt" ):
    # ... code ...
    if (re.search(r'\b'+ re.escape(searchPhrase) + r'\b', contents, re.IGNORECASE)):
        print( files )
        searchResult.append( files )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM