简体   繁体   中英

Logic for a file compare

I trying to write a programm for file compare. For example:

file1

1
2
3
4
5

file2

1
2
@
3
4
5

If I do it line by line, I get:

1 == 1; 
2 == 2;
3 != @;
4 != 3;
5 != 4;
  != 5;

But, the truth is that the only difference between the files is @. I want get something like this:

1 == 1;
2 == 2;
  != @;
3 == 3;
4 == 4;
5 == 5;

Which is the best way to do it? without using any external application, such as diff, fc, etc.

I wonder if Levenshtein Distance would help you in this situation. It would give you how similar the two files are but I don't know if you could zero in on the @. Something to look at none the less.

我相信你要找的是2弦之间的距离,也许可以帮到你。

Python has a very handy library for comparing sequences called difflib . The underlying SequenceMatcher class takes two python sequences and gives you (among other things) a sequence of opcodes telling you how you would get from the first sequence to the second (ie the differences). These are of the form:

  • Replace this block with that one
  • Insert a block
  • Delete a block
  • Copy a block (called 'equal')

These reference blocks by giving indices into the original sequences. This can be applied to lines in a file or characters in a string or anything else you can turn into a sequence in python.

If you are not writing the program to learn something about diff algorithms but are simply looking for a solution, you should try diff-match-patch . It contains implementations of diff and patch algorithms in different programming languages (cpp, c#, java, javascript, python).

I tried its java version and it worked like a charm.

A bit out of date, I suppose :) but I came across this post because I was looking for help on the same problem: I have two files which I display side by side, and I have to mark the lines that don't match in red.

Mine is a little bit of a special case, though, because 1) order is not important, and 2) each line is guaranteed to occur only once (the text is a license file with definitions, line by line).

It turned out that the easiest way of doing it was just to make lists of the two files, ls1 and ls2, and do the following (in pseudocode):

i = 0;
while (i < ls1.count) {
    n = ls2.find(ls1[i]);
    if (n >= 0) {
        // found match in ls2
        ls1.Delete(i);
        ls2.Delete(n);
    } else
        i++;
}

Explained, for each line is ls1, see if there is a corresponding line in ls2. If so, delete both. What you're left with is simply the differences, and you can easily mark up those lines in the original text.

Extremely easy, no libraries included. Just my two cents...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM