简体   繁体   中英

C# OutOfMemory, Mapped Memory File or Temp Database

Seeking some advice, best practice etc...

Technology: C# .NET4.0, Winforms, 32 bit

I am seeking some advice on how I can best tackle large data processing in my C# Winforms application which experiences high memory usage (working set) and the occasional OutOfMemory exception.

The problem is that we perform a large amount of data processing "in-memory" when a "shopping-basket" is opened. In simplistic terms when a "shopping-basket" is loaded we perform the following calculations;

  1. For each item in the "shopping-basket" retrieve it's historical price going all the way back to the date the item first appeared in-stock (could be two months, two years or two decades of data). Historical price data is retrieved from text files, over the internet, any format which is supported by a price plugin.

  2. For each item, for each day since it first appeared in-stock calculate various metrics which builds a historical profile for each item in the shopping-basket.

The result is that we can potentially perform hundreds, thousand and/or millions of calculations depending upon the number of items in the "shopping-basket". If the basket contains too many items we run the risk of hitting a "OutOfMemory" exception.

A couple of caveats ;

  1. This data needs to be calculated for each item in the "shopping-basket" and the data is kept until the "shopping-basket" is closed.

  2. Even though we perform steps 1 and 2 in a background thread, speed is important as the number of items in the "shopping-basket" can greatly effect overall calculation speed.

  3. Memory is salvaged by the .NET garbage collector when a "shopping-basket" is closed. We have profiled our application and ensure that all references are correctly disposed and closed when a basket is closed.

  4. After all the calculations are completed the resultant data is stored in a IDictionary. "CalculatedData is a class object whose properties are individual metrics calculated by the above process.

Some ideas I've thought about;

Obviously my main concern is to reduce the amount of memory being used by the calculations however the volume of memory used can only be reduced if I
1) reduce the number of metrics being calculated for each day or
2) reduce the number of days used for the calculation.

Both of these options are not viable if we wish to fulfill our business requirements.

  • Memory Mapped Files
    One idea has been to use memory mapped files which will store the data dictionary. Would this be possible/feasible and how can we put this into place?

  • Use a temporary database
    The idea is to use a separate (not in-memory) database which can be created for the life-cycle of the application. As "shopping-baskets" are opened we can persist the calculated data to the database for repeated use, alleviating the requirement to recalculate for the same "shopping-basket".

Are there any other alternatives that we should consider? What is best practice when it comes to calculations on large data and performing them outside of RAM?

Any advice is appreciated....

As an update for those stumbling upon this thread...

We ended up using SQLite as our caching solution. The SQLite database we employ exists separate to the main data store used by the application. We persist calculated data to the SQLite (diskCache) as it's required and have code controlling cache invalidation etc. This was a suitable solution for us as we were able to achieve write speeds up and around 100,000 records per second.

For those interested, this is the code that controls inserts into the diskCache. Full credit for this code goes to JP Richardson (shown answering a question here) for his excellent blog post.

internal class SQLiteBulkInsert
{
#region Class Declarations

private SQLiteCommand m_cmd;
private SQLiteTransaction m_trans;
private readonly SQLiteConnection m_dbCon;

private readonly Dictionary<string, SQLiteParameter> m_parameters = new Dictionary<string, SQLiteParameter>();

private uint m_counter;

private readonly string m_beginInsertText;

#endregion

#region Constructor

public SQLiteBulkInsert(SQLiteConnection dbConnection, string tableName)
{
    m_dbCon = dbConnection;
    m_tableName = tableName;

    var query = new StringBuilder(255);
    query.Append("INSERT INTO ["); query.Append(tableName); query.Append("] (");
    m_beginInsertText = query.ToString();
}

#endregion

#region Allow Bulk Insert

private bool m_allowBulkInsert = true;
public bool AllowBulkInsert { get { return m_allowBulkInsert; } set { m_allowBulkInsert = value; } }

#endregion

#region CommandText

public string CommandText
{
    get
    {
        if(m_parameters.Count < 1) throw new SQLiteException("You must add at least one parameter.");

        var sb = new StringBuilder(255);
        sb.Append(m_beginInsertText);

        foreach(var param in m_parameters.Keys)
        {
            sb.Append('[');
            sb.Append(param);
            sb.Append(']');
            sb.Append(", ");
        }
        sb.Remove(sb.Length - 2, 2);

        sb.Append(") VALUES (");

        foreach(var param in m_parameters.Keys)
        {
            sb.Append(m_paramDelim);
            sb.Append(param);
            sb.Append(", ");
        }
        sb.Remove(sb.Length - 2, 2);

        sb.Append(")");

        return sb.ToString();
    }
}

#endregion

#region Commit Max

private uint m_commitMax = 25000;
public uint CommitMax { get { return m_commitMax; } set { m_commitMax = value; } }

#endregion

#region Table Name

private readonly string m_tableName;
public string TableName { get { return m_tableName; } }

#endregion

#region Parameter Delimiter

private const string m_paramDelim = ":";
public string ParamDelimiter { get { return m_paramDelim; } }

#endregion

#region AddParameter

public void AddParameter(string name, DbType dbType)
{
    var param = new SQLiteParameter(m_paramDelim + name, dbType);
    m_parameters.Add(name, param);
}

#endregion

#region Flush

public void Flush()
{
    try
    {
        if (m_trans != null) m_trans.Commit();
    }
    catch (Exception ex)
    {
        throw new Exception("Could not commit transaction. See InnerException for more details", ex);
    }
    finally
    {
        if (m_trans != null) m_trans.Dispose();

        m_trans = null;
        m_counter = 0;
    }
}

#endregion

#region Insert

public void Insert(object[] paramValues)
{
    if (paramValues.Length != m_parameters.Count) 
        throw new Exception("The values array count must be equal to the count of the number of parameters.");

    m_counter++;

    if (m_counter == 1)
    {
        if (m_allowBulkInsert) m_trans = m_dbCon.BeginTransaction();
        m_cmd = m_dbCon.CreateCommand();

        foreach (var par in m_parameters.Values)
            m_cmd.Parameters.Add(par);

        m_cmd.CommandText = CommandText;
    }

    var i = 0;
    foreach (var par in m_parameters.Values)
    {
        par.Value = paramValues[i];
        i++;
    }

    m_cmd.ExecuteNonQuery();

    if(m_counter != m_commitMax)
    {
        // Do nothing
    }
    else
    {
        try
        {
            if(m_trans != null) m_trans.Commit();
        }
        catch(Exception)
        { }
        finally
        {
            if(m_trans != null)
            {
                m_trans.Dispose();
                m_trans = null;
            }

            m_counter = 0;
        }
    }
}

#endregion

}

The easiest solution is a database, perhaps SQLite. Memory mapped files don't automatically become dictionaries, you would have to code all the memory management yourself, and thereby fight with the .net GC system itself for ownership of he data.

If you're interested in trying the memory mapped file approach, you can try it now. I wrote a small native .NET package called MemMapCache that in essence creates a key/val database backed by MemMappedFiles. It's a bit of a hacky concept, but the program MemMapCache.exe keeps all references to the memory mapped files so that if your application crashes, you don't have to worry about losing the state of your cache.

It's very simple to use and you should be able to drop it in your code without too many modifications. Here is an example using it: https://github.com/jprichardson/MemMapCache/blob/master/TestMemMapCache/MemMapCacheTest.cs

Maybe it'd be of some use to you to at least further figure out what you need to do for an actual solution.

Please let me know if you do end up using it. I'd be interested in your results.

However, long-term, I'd recommend Redis.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM