Need suggestions on optimizing this code

Question

Currently when I read a 15Mb file my application goes over a gig of memory. Notice that, at the end of the main code, I compare the data that was inserted in the database with the original array from the file. Any suggestions are welcome.

Main code:

TestEntities entities = new TestEntities();

        using (FileStream fileStream = new FileStream(fileName + ".exe", FileMode.Open, FileAccess.Read))
        {

            byte[] bytes = new byte[fileStream.Length];

            int numBytesToRead = (int) fileStream.Length;
            int numBytesRead = 0;

            while (numBytesToRead > 0)
            {
                int n = fileStream.Read(bytes, numBytesRead, numBytesToRead);

                if (n == 0)
                    break;

                numBytesRead += n;
                numBytesToRead -= n;
            }

            var query = bytes.Select((x, i) => new {Index = i, Value = x})
                .GroupBy(x => x.Index/100)
                .Select(x => x.Select(v => v.Value).ToList())
                .ToList();

            foreach (List<byte> list in query)
            {
                Binary binary = new Binary();
                binary.Name = fileName + ".exe";
                binary.Value = list.ToArray();
                entities.AddToBinaries(binary);
            }

            entities.SaveChanges();

            List<Binary> fileString = entities.Binaries.Where(b => b.Name == fileName + ".exe").ToList();

            Byte[] final = ExtractArray(fileString);
            if (Compare(bytes, final))
            {
                 /// Some notification that was ok
            }

        }

Compare Method:

public bool Compare(Byte[] array1,Byte[] array2)
    {
        bool isEqual = false;
        if (array1.Count() == array2.Count())
        {

            for (int i = 0; i < array1.Count(); i++)
            {
                isEqual = array1[i] == array2[i];
                if (!isEqual)
                {
                    break;

                }
            }
        }


        return isEqual;
    }

ExtractArray Method:

public Byte[] ExtractArray(List<Binary> binaries )
    {
        List<Byte> finalArray = new List<Byte>();

        foreach (Binary binary in binaries)
        {
            foreach (byte b in binary.Value)
            {
                finalArray.Add(b);
            }

        }

        return finalArray.ToArray();
    }

Answer 1

For starters, I'd strongly recommend that you invest in a profiler. That's the right way to determine why your code is taking so long to run or is using a lot of memory. There are many profilers out there, including one built into Visual Studio 2010 if you have Premium or Ultimate.

See google or these post for others:

What Are Some Good .NET Profilers?

and

Best .NET memory and performance profiler?

Secondly, you probably shouldn't be assuming that your app should'nt go over a gig of memory. C# applications (actually, all .NET applications) are garbage collected. If I have a computer with sufficient RAM, there is no reason why the GC should run if there is no memory pressure, and if it doesn't the application can easily use up a gig of memory. That is particularly true for 64-bit environments, where processes are not subject to the memory limits of a 32-bit address space.

Answer 2

At first two variant of comapre:

bool arraysAreEqual = Enumerable.SequenceEqual(array1, array2);

or this one

    public bool Compare(Byte[] array1, Byte[] array2)
    {
        if (array1.Length != array2.Length)
            return false;

        for (int i = 0; i < array1.Length; i++)
        {
            if (array1[i] != array2[i])
                return false;
        }
        return true;            
    }

About extract try this:

foreach (Binary binary in binaries)
{
     finalArray.AddRange(binary.Value);
}

Answer 3

1) Do you know the static method File.ReadAllBytes ? Could save you the first fifteen lines of code.

2) I hate Linq... Unreadable and it's so hard to understand what is really going on.

        var query = bytes.Select((x, i) => new {Index = i, Value = x})
            .GroupBy(x => x.Index/100)
            .Select(x => x.Select(v => v.Value).ToList())
            .ToList();

So for each byte of your file, you create an object containing the byte itself and its index. Wow. If your file is 15mb, that's 15 728 640 objects. Lets say this object takes 64 bytes, that's 960mb of memory space.

Btw, what are you trying to do ?

Edit

var bytes = File.ReadAllBytes(filename);

var chunkCount = (int)Math.Ceilling(bytes.Length / 100.0);

var chunks = new List<ArraySegment<byte>>(chunkCount);


for(int i = 0; i < chunkCount; i++) {
  chunks.Add(new ArraySegment(
      bytes,
      i * 100,
      Math.Min(100, bytes.Length - i * 100)
  ));
}

This should be several times faster.

Still, for better performances, you might insert chunks in database as you read the file, without keeping all those bytes in memory.

Need suggestions on optimizing this code

Question

3 answers

solution1
2 2012-04-19 06:41:04

solution2
0 2012-04-19 06:41:54

solution3
0 2012-04-19 07:09:39

Need suggestions on optimizing this code

Question

3 answers

solution1 2 2012-04-19 06:41:04

solution2 0 2012-04-19 06:41:54

solution3 0 2012-04-19 07:09:39

solution1
2 2012-04-19 06:41:04

solution2
0 2012-04-19 06:41:54

solution3
0 2012-04-19 07:09:39