简体   繁体   中英

Searching strings and metadata in multiple files

Need to search thousands of files for specific strings/metadata, hex tags etc but this python code Ive done only searches the one file which would take an extremely long time

def check():
        datafile = file('example.txt')
        found = False
        for line in datafile:
            if blabla in line:
                found = True
                break

        return found

found = check()
if found:
    print "true"
else:
    print "false"

any suggestions? Thanks

Make the file name/path a parameter to the function. Then your function can process any file, not just one particular file. Then, call the function for each file that you want it to process. You will probably want to make a list of the file names/paths to process, then have a loop that does what you want for each file.

Eg.

def check(fname):
    datafile = open(fname)
    found = False
    # ...
    return found

files = ['a', 'b', 'c']
for fname in files:
    found = check(fname)
    if found:
        print("true")
    else:
        print("false")

Assuming the files are all contained in a directory "/foo":

import os, re
#Use a re.findall() to avoid line-by-line parsing
myrex = re.compile('blabla')

def check(filename):
    with open(filename) as myfile:
        matches = myrex.findall(myfile.read())
        return len(matches) > 0

os.chdir("/foo")
#Use an os.walk() to find the names of all files in this directory
for root,dir,files in os.walk('.'):
    for fname in files:
        print fname + ": " + str(check(fname))

If the files are stored in multiple locations, you'll need an extra loop around the "os.chdir()" block. If you have multiple patterns you're searching for, use another "re.compile()".

Does this help answer your question?

You may wish to consider glob or os.walk to retrieve filenames, but something like:

import fileinput

print any(blabla in line for line in fileinput.input(['some', 'list', 'of', 'file', 'names'])

This automatically reads the files sequentially and will short circuit on the truth test.

If all the files are in a single directory you can get them with os.listdir() . This will give you a list of all the files in the directory. From there, you can access each one with, for example os.listdir('/home/me/myData') . If you are on a unix based system: grep is a very powerful tool that will give you much flexibility. You may want grep -r "your query" ./ > results.txt . This will give you every line that matches your search and includes the option of using regular expressions... and saves it to a file. Otherwise, to search a lot of files with python only:

def check(x):
    return "blabla" in x
files = os.listdir('/home/me/files')
for f in files:
    x = open(f, "r").read()
    print check(x)

My check function behaves differently as it doesn't check line by line and True and False are printed with capital letters.

I imagine you might want to know which file the results came from. (and what line?)

for f in files:
    x = open(f, "r").read().split('\n')
    for count in range( len(x) ):
        if check(x[count]):
            print f + " " + count + " " +x[count]

...or whatever you need to know.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM