简体   繁体   中英

Ignoring lines while comparing files using Python

I am having two text files which I want to compare using Python. Both of these files are having Date in their header. So, i want to ignore this line while comparison as it will always vary and should not be treated as difference.

File1

Date : 04/29/2013
Some Text
More Text
....

File2

Date : 04/28/2013
Some Text
More Text
....

I have tried comparing them using filecmp module, but that doesn't supports any argument to ignore any pattern. Is there any other module which can be used for this purpose. I tried using difflib but was not successfull. Moreover, I just want whether there is difference b/w files as True or False , difflib was printing all the lines even if there was no difference using whitespace .

Use itertools.ifilter (or builtin filter in Python 3)

itertools.ifilter(predicate, iterable)

Your predicate should be a function, returning False for lines you want to ignore. eg.

def predicate(line):
    if 'something' in line:
        return False # ignore it
    return True

Then use it on your file object. fin = ifilter(predicate, fin)

Then just use something like

from itertools import izip, ifilter # on Python 3 instead use builtin zip and filter
f1 = ifilter(predicate, f1)
f2 = ifilter(predicate, f2)

all(x == y for x, y in izip(f1, f2))

You don't need difflib unless you want to see what the differences were, and since you have tried filecmp I assume you only want to know whether there were difference or not. Unfortunately, filecmp only works with the filenames.

Also for skipping the first line of each file just use itertools.islice(fin, 1, None)

from itertools import islice, izip

def predicate(line):
    ''' you can add other general checks in here '''
    if line.startswith('Date'):
        return False # ignore it
    return True

with open('File1.txt') as f1, open('File2.txt') as f2:
    f1 = ifilter(predicate, f1)
    f2 = ifilter(predicate, f2)
    print(all(x == y for x, y in izip(f1, f2)))

>>> True

If you know this date is always on the first line and you copy the lines in a list of strings you just can remove the first line by writing lines[1:]

Added after comment:

Probably it's best to use ifilter in the other solution. If the files are different you have to iterate through them (using two indices, one for each file) and skip lines that contain one of the keywords.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM