简体   繁体   中英

Out of memory exception trying to write large amount of data to a file

I need to write a large amount of unsorted data (50000000 numbers) to a file. At runtime I get an OutOfMemoryException . How to fix it?

private void backgroundWorkerGenNum_DoWork(object sender, DoWorkEventArgs e)
{
    int numLimit = 50000000;
    Random randomize = new Random();
    List<string> strNums = new List<string>();

    var array = Enumerable.Range(1, numLimit).ToArray();
    array = array.OrderBy(n => Guid.NewGuid()).ToArray();
    StreamWriter file = new StreamWriter("numbers.txt");
    int i = 0;
    foreach(int element in array)
    {
        file.WriteLine(element);
        ++i;
        backgroundWorkerGenNum.ReportProgress(i);
    }
}

First of all, you can just shuffle your array like this:

public static class ArrayExtender
{
    public static void Shuffle<T>(this T[] a)
    {
        Random rand = new Random();
        for (int i = a.Length - 1; i > 0; i--)
        {
            int j = rand.Next(0, i + 1);
            T tmp = a[i];
            a[i] = a[j];
            a[j] = tmp;
        }
    }
}

well, now we can generate randomized data:

    private void backgroundWorkerGenNum_DoWork(object sender, DoWorkEventArgs e)
    {
        int numLimit = 50000000;

        var array = Enumerable.Range(1, numLimit).ToArray();
        array.Shuffle();
        int i = 0;
        using(StreamWriter file = new StreamWriter("numbers.txt"))
           foreach (int element in array)
           {
               file.WriteLine(element);
               ++i;
               backgroundWorkerGenNum.ReportProgress(i);
           }
    }

This is a very inefficient way of randomizing a collection of numbers:

array = array.OrderBy(n => Guid.NewGuid()).ToArray();

Each new Guid you generate requires 16 byes + a few bytes of overhead to store.

You have 5*10^7 numbers that you're randomizing by abusing the OrderBy method.

The OrderBy internally will attempt to sort your collection using the keys you provide which requires it to allocate memory for the data and keys. Assuming the Order By stores the generated key along with each element this would require more than 1GB of space in memory.

Consider using shuffle method like this

private void Shuffle(int[] data)
{
    var random = new Random();

    int n = data.Length;
    for (int i = 0; i < n; i++)
    {
        int idx = random.Next(i, n);

        int x = data[i];
        data[i] = data[idx];
        data[idx] = x;
    } 
}

instead of array.OrderBy(n => Guid.NewGuid()).ToArray()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM