简体   繁体   中英

Find a file in a directory using python by partial name

I have a directory with several hundred thousand files in it.

They all follow this format:

datetime_fileid_metadata_collect.txt

A specific example looks like this :

201405052359559_0002230255_35702088_collect88.txt

I am trying to write a script that pulls out and copies individual files when all I provide it is a list of file ids.

For example I have a text document fileids.txt that constains this

fileids.txt
0002230255
0001627237
0001023000

This is the example script I have written so far. file1 result keeps returning []

import os
import re, glob, shutil
base_dir = 'c:/stuff/tub_0_data/'
destination = 'c:/files_goes_here'
os.chdir(base_dir)
text_file = open('c:/stuff/fileids.txt', 'r')
file_ids = text_file.readlines()
#file_ids = [stripped for stripped in (line.strip() for line in text_file.readlines()) if stripped]
for ids in file_ids:
    id1 = ids.rstrip()
    print 'file id = ',str(id1)
    file1 = glob.glob('*' + str(id1) + '*')
    print str(file1)
    if file1 != []:
        shutil.copy(base_dir + file1, destination)

I know I dont fully understand glob or regular expressions yet. What would I put there if I want to find files based off of a specific string of their filename?

EDIT:

glob.glob('*' + stuff '*') 

worked for finding things within the filename. Not removing linespace was the issue.

text_file.readlines() reads the entire line including the trailing '\\n'. Try stripping it. The following will strip newlines and remove empties:

file_ids = [line.strip() for line in text_file if not line.isspace()]

Your issue might have been linespace and it might have been answered, but I think you can do with some cleaning up of the code. Admittedly, I don't see the need for the import os and import sys , unless they are part of your bigger code.

Something like the following works well enough.

Code:

import glob
import shutil

base_dir = "C:/Downloads/TestOne/"
dest_dir = "C:/Downloads/TestTwo/"

with open("blah.txt", "rb") as ofile:
    lines = [line.strip() for line in ofile.readlines()]
    for line in lines:
        print "File ID to Process: {}".format(line)
        pattern_ = base_dir + "*" + str(line) + "*"
        print pattern_
        file_ = glob.glob(pattern_)
        print str(file_[0])
        shutil.copy(file_[0], dest_dir)
        print "{} copied.".format(file_[0])

Output:

File ID to Process: 123456
C:/Downloads/TestOne/*123456*
C:/Downloads/TestOne\foobar_123456_spam.txt
C:/Downloads/TestOne\foobar_123456_spam.txt copied.
[Finished in 0.4s]

glob is a rather expensive operation though. You're better off listing the files on the get-go and match them afterwards, copying as you hit a match. Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM