简体   繁体   中英

Get Estimate of Line Count in a text file

I would like to get an estimate of the number of lines in a csv/text file so that I can use that number for a progress bar. The file could be extremely large so getting the exact number of lines will take too long for this purpose.

What I have come up with is below (read in a portion of the file and count the number of lines and use the file size to estimate the total number of lines):

    public static int GetLineCountEstimate(string file)
    {
        double count = 0;
        using (var fs = new FileStream(file, FileMode.Open, FileAccess.Read))
        {
            long byteCount = fs.Length;
            int maxByteCount = 524288;
            if (byteCount > maxByteCount)
            {
                var buf = new byte[maxByteCount];
                fs.Read(buf, 0, maxByteCount);
                string s = System.Text.Encoding.UTF8.GetString(buf, 0, buf.Length);
                count = s.Split('\n').Length * byteCount / maxByteCount;
            }
            else
            {
                var buf = new byte[byteCount];
                fs.Read(buf, 0, (int)byteCount);
                string s = System.Text.Encoding.UTF8.GetString(buf, 0, buf.Length);
                count = s.Split('\n').Length;
            }
        }
        return Convert.ToInt32(count);
    }

This seems to work ok, but I have some concerns:

1) I would like to have my parameter simply as Stream (as opposed to a filename) since I may also be reading from the clipboard (MemoryStream). However Stream doesn't seem to be able to read n bytes at once into a buffer or get the total length of the Stream in bytes, like FileStream can. Stream is the parent class to both MemoryStream and FileStream.

2) I don't want to assume an encoding such as UTF8

3) I don't want to assume an end of line character (it should work for CR, CRLF, and LF)

I would appreciate any help to make this function more robust.

Here is what I came up with as a more robust solution for estimating line count.

public static int EstimateLineCount(string file)
{
    using (var fs = new FileStream(file, FileMode.Open, FileAccess.Read))
    {
        return EstimateLineCount(fs);
    }
}

public static int EstimateLineCount(Stream s)
{
    //if file is larger than 10MB estimate the line count, otherwise get the exact line count
    const int maxBytes = 10485760; //10MB = 1024*1024*10 bytes

    s.Position = 0;
    using (var sr = new StreamReader(s, Encoding.UTF8))
    {
        int lineCount = 0;
        if (s.Length > maxBytes)
        {
            while (s.Position < maxBytes && sr.ReadLine() != null)
                lineCount++;

            return Convert.ToInt32((double)lineCount * s.Length / s.Position);
        }

        while (sr.ReadLine() != null)
            lineCount++;
        return lineCount;
    }
}
var lineCount = File.ReadLines(@"C:\file.txt").Count();

An other way:

var lineCount = 0;
using (var reader = File.OpenText(@"C:\file.txt"))
{
    while (reader.ReadLine() != null)
    {
        lineCount++;
    }
}

You're cheating! You're asking more than one question... I'll try to help you anyway :P

  1. No, you can't use Stream, but you can use StreamReader. This should provide the flexibility you need.

  2. Test for encoding, since I deduce you'll be working with various. Keep in mind however that it's usually hard to cater for ALL scenarios, so pick a few important ones first, and extend your program later.

  3. Don't - let me show you how:

First, consider your source. Whether it's a file or memory stream, you should have an idea about it's size. I've done the file bit because I'm lazy and it's easy, so you'll have to figure out the memory stream bit yourself. What I've done is much simpler but less accurate: Read the first line of the file, and use it as a percentage of the size of the file. Note I multiplied the length of the string by 2 as that is the delta, in other words number of extra bytes used per extra character in a string. Obviously this isn't very accurate, so you can extend it to x number of lines, just keep in mind that you'll have to change the formula as well.

static void Main(string[] args)
    {
        FileInfo fileInfo = new FileInfo((@"C:\Muckabout\StringCounter\test.txt"));
        using (var stream = new StreamReader(fileInfo.FullName))
        {
            var firstLine = stream.ReadLine(); // Read the first line.
            Console.WriteLine("First line read. This is roughly " + (firstLine.Length * 2.0) / fileInfo.Length * 100 + " per cent of the file.");
        }
        Console.ReadKey();
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM