简体   繁体   中英

Python : compare two files with different line endings

I have two files. File test.a and test.b . test.a was pre-generated on unix machine. test.b is generated by user and can be generated both on windows and unix machines.

I can't use filecmp.cmp('test01/test.a', 'test01/test.b') because it'll always return false, all thanks to different line endings.

Is there any elegant solution to this? If not, what would be the best way to change line endings of unix file before comparing it?

Thanks!

Assuming the two are text files, using standard open() and readline() functions should work, because unless b is passed, they operate with universal newlines (converting to \\n ):

def cmp_lines(path_1, path_2):
    l1 = l2 = True
    with open(path_1, 'r') as f1, open(path_2, 'r') as f2:
        while l1 and l2:
            l1 = f1.readline()
            l2 = f2.readline()
            if l1 != l2:
                return False
    return True

That will compare the files line-by-line, and return False as soon as two non-matching lines are found (also closing the file, due to the with block). If all the lines match, it returns True . All newlines are automatically converted to \\n . Note that readline() returns '' when EOF (End Of File) is reached.

What if you found what newline character the first line of one file used, and then depending on what that was, choose to replace all instances of that char with whatever the other file uses so you could use cmp , or not if they are already the same. I know you said you are dealing with large files, so perhaps this wouldn't suit at all.

However, look here regarding the detection of the newline character used in a file How can I detect DOS line breaks in a file?

and here regarding efficiency in a search and replace on a large string Fastest Python method for search and replace on a large string

hope this helps, apologies if not

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM