简体   繁体   中英

Fastest way to check whether a file contains any string from a list of strings

I have several .tgz log files each containing few hundred to thousand lines. I also have a list of error strings. I have to read each and every log file inside the zip file and check whether any of the error strings is present in that file. I also need to get the name of the file in which the error pattern was found.

errorList = ["errorPattern1", "errorPattern2",..., "errorPatternN"]

Which is the fastest way to do it in Python?

Nested loops iterating over the '.tgz' files in the directory and over the items in each tarfile. Read the text of the entire file object at once. Then check if any of the error patterns are in the text.

Something like this:

import glob, tarfile

for fname in glob.iglob('*.tgz'):
    with tarfile.open('filename', 'rb') as tar:

        for info in iter(tar.next, None):
            text = tar.extractfile(info).read()

            if any(msg in text for msg in error_list):
                print "an error message was found in: ", info.name

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM