简体   繁体   English

C#OutOfMemory,映射的内存文件或临时数据库

[英]C# OutOfMemory, Mapped Memory File or Temp Database

Seeking some advice, best practice etc... 寻求一些建议,最佳实践等...

Technology: C# .NET4.0, Winforms, 32 bit 技术:C#.NET4.0,Winforms,32位

I am seeking some advice on how I can best tackle large data processing in my C# Winforms application which experiences high memory usage (working set) and the occasional OutOfMemory exception. 我正在寻求一些有关如何最好地解决C#Winforms应用程序中大型数据处理问题的建议,该应用程序会遇到较高的内存使用率(工作集)和偶发的OutOfMemory异常。

The problem is that we perform a large amount of data processing "in-memory" when a "shopping-basket" is opened. 问题是,当打开“购物篮”时,我们会在“内存中”执行大量数据处理。 In simplistic terms when a "shopping-basket" is loaded we perform the following calculations; 简而言之,当加载“购物篮”时,我们执行以下计算;

  1. For each item in the "shopping-basket" retrieve it's historical price going all the way back to the date the item first appeared in-stock (could be two months, two years or two decades of data). 对于“购物篮”中的每个商品,其历史价格一直追溯到该商品首次出现在库存中的日期(可能是两个月,两年或几十年的数据)。 Historical price data is retrieved from text files, over the internet, any format which is supported by a price plugin. 历史价格数据是通过互联网从文本文件中检索价格插件支持的任何格式的。

  2. For each item, for each day since it first appeared in-stock calculate various metrics which builds a historical profile for each item in the shopping-basket. 对于每件商品,自从它第一次出现在库存中以来的每一天,都要计算各种指标,从而为购物篮中的每件商品建立历史档案。

The result is that we can potentially perform hundreds, thousand and/or millions of calculations depending upon the number of items in the "shopping-basket". 结果是,根据“购物篮”中的项目数量,我们有可能执行数十万,/或数百万次计算。 If the basket contains too many items we run the risk of hitting a "OutOfMemory" exception. 如果购物篮中的物品太多,我们将冒碰到“ OutOfMemory”异常的风险。

A couple of caveats ; 一些注意事项 ;

  1. This data needs to be calculated for each item in the "shopping-basket" and the data is kept until the "shopping-basket" is closed. 需要为“购物篮”中的每个项目计算此数据,并保留数据,直到“购物篮”关闭。

  2. Even though we perform steps 1 and 2 in a background thread, speed is important as the number of items in the "shopping-basket" can greatly effect overall calculation speed. 即使我们在后台线程中执行第1步和第2步,速度也很重要,因为“购物篮”中的项目数会极大地影响总体计算速度。

  3. Memory is salvaged by the .NET garbage collector when a "shopping-basket" is closed. 当关闭“购物篮”时,.NET垃圾收集器将抢救内存。 We have profiled our application and ensure that all references are correctly disposed and closed when a basket is closed. 我们已经分析了我们的应用程序,并确保在关闭购物篮时正确放置和关闭所有引用。

  4. After all the calculations are completed the resultant data is stored in a IDictionary. 完成所有计算后,结果数据将存储在IDictionary中。 "CalculatedData is a class object whose properties are individual metrics calculated by the above process. “ CalculatedData是一个类对象,其属性是通过上述过程计算出的各个指标。

Some ideas I've thought about; 我考虑过的一些想法;

Obviously my main concern is to reduce the amount of memory being used by the calculations however the volume of memory used can only be reduced if I 显然,我主要关心的是减少计算使用的内存量,但是只有在以下情况下,才能减少使用的内存量:
1) reduce the number of metrics being calculated for each day or 1)减少每天要计算的指标数量,或者
2) reduce the number of days used for the calculation. 2)减少用于计算的天数。

Both of these options are not viable if we wish to fulfill our business requirements. 如果我们希望满足我们的业务需求,那么这两种选择都不可行。

  • Memory Mapped Files 内存映射文件
    One idea has been to use memory mapped files which will store the data dictionary. 一种想法是使用内存映射文件来存储数据字典。 Would this be possible/feasible and how can we put this into place? 这是否可能/可行,我们如何将其落实到位?

  • Use a temporary database 使用临时数据库
    The idea is to use a separate (not in-memory) database which can be created for the life-cycle of the application. 这个想法是使用一个单独的(不是内存中的)数据库,该数据库可以在应用程序的生命周期中创建。 As "shopping-baskets" are opened we can persist the calculated data to the database for repeated use, alleviating the requirement to recalculate for the same "shopping-basket". 打开“购物篮”后,我们可以将计算得出的数据持久保存到数据库中以供重复使用,从而减少了为同一“购物篮”进行重新计算的需求。

Are there any other alternatives that we should consider? 我们还有其他选择吗? What is best practice when it comes to calculations on large data and performing them outside of RAM? 关于大数据计算并在RAM外部执行计算的最佳实践是什么?

Any advice is appreciated.... 任何建议表示赞赏。

As an update for those stumbling upon this thread... 作为那些绊脚石的更新...

We ended up using SQLite as our caching solution. 我们最终使用SQLite作为我们的缓存解决方案。 The SQLite database we employ exists separate to the main data store used by the application. 我们采用的SQLite数据库与应用程序使用的主数据存储区分开存在。 We persist calculated data to the SQLite (diskCache) as it's required and have code controlling cache invalidation etc. This was a suitable solution for us as we were able to achieve write speeds up and around 100,000 records per second. 我们根据需要将计算的数据持久保存到SQLite(diskCache)中,并具有控制缓存无效等的代码。这对我们来说是一种合适的解决方案,因为我们能够实现每秒写入速度约100,000条记录。

For those interested, this is the code that controls inserts into the diskCache. 对于感兴趣的人来说,这是控制插入diskCache的代码。 Full credit for this code goes to JP Richardson (shown answering a question here) for his excellent blog post. JP Richardson (在此处回答一个问题)的出色博客文章完全归功于此代码

internal class SQLiteBulkInsert
{
#region Class Declarations

private SQLiteCommand m_cmd;
private SQLiteTransaction m_trans;
private readonly SQLiteConnection m_dbCon;

private readonly Dictionary<string, SQLiteParameter> m_parameters = new Dictionary<string, SQLiteParameter>();

private uint m_counter;

private readonly string m_beginInsertText;

#endregion

#region Constructor

public SQLiteBulkInsert(SQLiteConnection dbConnection, string tableName)
{
    m_dbCon = dbConnection;
    m_tableName = tableName;

    var query = new StringBuilder(255);
    query.Append("INSERT INTO ["); query.Append(tableName); query.Append("] (");
    m_beginInsertText = query.ToString();
}

#endregion

#region Allow Bulk Insert

private bool m_allowBulkInsert = true;
public bool AllowBulkInsert { get { return m_allowBulkInsert; } set { m_allowBulkInsert = value; } }

#endregion

#region CommandText

public string CommandText
{
    get
    {
        if(m_parameters.Count < 1) throw new SQLiteException("You must add at least one parameter.");

        var sb = new StringBuilder(255);
        sb.Append(m_beginInsertText);

        foreach(var param in m_parameters.Keys)
        {
            sb.Append('[');
            sb.Append(param);
            sb.Append(']');
            sb.Append(", ");
        }
        sb.Remove(sb.Length - 2, 2);

        sb.Append(") VALUES (");

        foreach(var param in m_parameters.Keys)
        {
            sb.Append(m_paramDelim);
            sb.Append(param);
            sb.Append(", ");
        }
        sb.Remove(sb.Length - 2, 2);

        sb.Append(")");

        return sb.ToString();
    }
}

#endregion

#region Commit Max

private uint m_commitMax = 25000;
public uint CommitMax { get { return m_commitMax; } set { m_commitMax = value; } }

#endregion

#region Table Name

private readonly string m_tableName;
public string TableName { get { return m_tableName; } }

#endregion

#region Parameter Delimiter

private const string m_paramDelim = ":";
public string ParamDelimiter { get { return m_paramDelim; } }

#endregion

#region AddParameter

public void AddParameter(string name, DbType dbType)
{
    var param = new SQLiteParameter(m_paramDelim + name, dbType);
    m_parameters.Add(name, param);
}

#endregion

#region Flush

public void Flush()
{
    try
    {
        if (m_trans != null) m_trans.Commit();
    }
    catch (Exception ex)
    {
        throw new Exception("Could not commit transaction. See InnerException for more details", ex);
    }
    finally
    {
        if (m_trans != null) m_trans.Dispose();

        m_trans = null;
        m_counter = 0;
    }
}

#endregion

#region Insert

public void Insert(object[] paramValues)
{
    if (paramValues.Length != m_parameters.Count) 
        throw new Exception("The values array count must be equal to the count of the number of parameters.");

    m_counter++;

    if (m_counter == 1)
    {
        if (m_allowBulkInsert) m_trans = m_dbCon.BeginTransaction();
        m_cmd = m_dbCon.CreateCommand();

        foreach (var par in m_parameters.Values)
            m_cmd.Parameters.Add(par);

        m_cmd.CommandText = CommandText;
    }

    var i = 0;
    foreach (var par in m_parameters.Values)
    {
        par.Value = paramValues[i];
        i++;
    }

    m_cmd.ExecuteNonQuery();

    if(m_counter != m_commitMax)
    {
        // Do nothing
    }
    else
    {
        try
        {
            if(m_trans != null) m_trans.Commit();
        }
        catch(Exception)
        { }
        finally
        {
            if(m_trans != null)
            {
                m_trans.Dispose();
                m_trans = null;
            }

            m_counter = 0;
        }
    }
}

#endregion

} }

The easiest solution is a database, perhaps SQLite. 最简单的解决方案是数据库,也许是SQLite。 Memory mapped files don't automatically become dictionaries, you would have to code all the memory management yourself, and thereby fight with the .net GC system itself for ownership of he data. 内存映射文件不会自动变成字典,您必须自己编写所有内存管理的代码,从而与.net GC系统本身争夺数据所有权。

If you're interested in trying the memory mapped file approach, you can try it now. 如果您有兴趣尝试使用内存映射文件方法,则可以立即尝试。 I wrote a small native .NET package called MemMapCache that in essence creates a key/val database backed by MemMappedFiles. 我编写了一个名为MemMapCache的小型本机.NET程序包,该程序包实质上创建了一个由MemMappedFiles支持的密钥/ val数据库。 It's a bit of a hacky concept, but the program MemMapCache.exe keeps all references to the memory mapped files so that if your application crashes, you don't have to worry about losing the state of your cache. 这有点古怪,但是程序MemMapCache.exe保留了对内存映射文件的所有引用,因此,如果应用程序崩溃,则不必担心丢失缓存状态。

It's very simple to use and you should be able to drop it in your code without too many modifications. 它非常易于使用,您无需进行太多修改就可以将其放入代码中。 Here is an example using it: https://github.com/jprichardson/MemMapCache/blob/master/TestMemMapCache/MemMapCacheTest.cs 这是一个使用它的示例: https : //github.com/jprichardson/MemMapCache/blob/master/TestMemMapCache/MemMapCacheTest.cs

Maybe it'd be of some use to you to at least further figure out what you need to do for an actual solution. 也许至少对您进一步了解实际解决方案需要做什么有用。

Please let me know if you do end up using it. 如果您最终使用它,请告诉我。 I'd be interested in your results. 我会对您的结果感兴趣。

However, long-term, I'd recommend Redis. 但是,从长期来看,我建议使用Redis。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM