简体   繁体   中英

Efficient Methods of Comparing Text Files Simultaneously

I did check to see if any existing questions matched mine but I didn't see any, if I did, my mistake.

I have two text files to compare against each other, one is a temporary log file that is overwritten sometimes, and the other is a permanent log, which will collect and append all of the contents of the temp log into one file (it will collect new lines in the log since when it last checked and append the new lines to the end of the complete log). However after a point this may lead to the complete log becoming quite large and therefore not so efficient to compare against so i have been thinking about different methods to approach this.

my first idea is to "buffer" the temp log (being that it will normally be the smaller of the two) strings into a list and simply loop through the archive log and do something like:

List<String> bufferedlines = new List<string>();
using (StreamReader ArchiveStream = new StreamReader(ArchivePath))
{
    if (bufferedlines.Contains(ArchiveStream.ReadLine()))
    {

    }
}

Now there is a couple of ways I could proceed from here, I could create yet another list to store the inconsistencies, close the read stream (I'm not sure you can both read and write at the same time, if you can that might make things easier for my options) then open a write stream in append mode and write the list to the file. alternatively, cutting out the buffering the inconsistencies, i could open a write stream while the files are being compared and on the spot write the lines that aren't matched.

The other method i could think of was limited by my knowledge of whether it could be done or not, which was rather than buffer either file, compare the streams side by side as they are read and append the lines on the fly. Something like:

using (StreamReader ArchiveStream = new StreamReader(ArchivePath))
{
    using (StreamReader templogStream = new StreamReader(tempPath))
    {
        if (!(ArchiveStream.ReadAllLines.Contains(TemplogStream.ReadLine())))
        {
            //write the line to the file
        }
    }
}

As I said I'm not sure whether that would work or that it may be more efficient than the first method, so i figured i'd ask, see if anyone had insight into how this might properly be implemented, and whether it was the most efficient way or there was a better method out there.

Effectively what you want here is all of the items from one set that aren't in another set. This is set subtraction, or in LINQ terms, Except . If your data sets were sufficiently small you could simply do this:

var lines =  File.ReadLines(TempPath)
    .Except(File.ReadLines(ArchivePath))
    .ToList();//can't write to the file while reading from it
File.AppendAllLines(ArchivePath, lines);

Of course, this code requires bringing the all of the lines in the temp file into memory, because that's just how Except is implemented. It creates a HashSet of all of the items so that it can efficiently find matches from the other sequence.

Presumably here the number of lines that need to be added here is pretty small, so the fact that the lines that we find here all need to be stored in memory isn't a problem. If there will potentially be a lot the, you'd want to write them out to another file besides the first one (possibly concatting the two files together when done, if needed).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM