简体   繁体   English

如何提高实体框架代码的性能?

[英]How to improve the performance of Entity Framework Code?

In the project, I need to call an external API based on time.在项目中,我需要根据时间调用外部API。 So, for one day, I may need to call the API 24 times, one call for one hour period.因此,有一天,我可能需要调用 API 24 次,一次调用一小时。 The API result is a XML file which has 6 fields. API 结果是一个包含 6 个字段的 XML 文件。 I will need to insert these data into a table.我需要将这些数据插入到表中。 Averagely, for each hour, it has about 20,000 rows data.平均每小时,它有大约 20,000 行数据。

The table has these 6 columns:该表有以下 6 列:

col1, col2, col3, col4, col5, col6

When all 6 columns are the same, we consider the rows are the same, and we should not insert duplications.当所有6列都相同时,我们认为行是相同的,我们不应该插入重复。

I'm using C# and Entity Framework for this:我为此使用 C# 和实体框架:

foreach (XmlNode node in nodes)
{
    try
    {
        count++;

        CallData data = new CallData();
        ...
        // get all data and set in 'data'

        // check whether in database already                        
        var q = ctx.CallDatas.Where(x => x.col1 == data.col1
                    && x.col2 == data.col2
                    && x.col3 == data.col3
                    && x.col4 == data.col4
                    && x.col5 == data.col5
                    && x.col6 == data.col6
                ).Any();
        if (q)
        {
            // exists in database, skip
            // log info
        }
        else
        {
            string key = $"{data.col1}|{data.col2}|{data.col3}|{data.col4}|{data.col5}|{data.col6}";
            // check whether in current chunk already
            if (dic.ContainsKey(key))
            {
                // in current chunk, skip
                // log info
            }
            else
            {
                // insert
                ctx.CallDatas.Add(data);

                // update dic
                dic.Add(key, true);
            }
        }
    }
    catch (Exception ex)
    {
        // log error
    }
}
Logger.InfoFormat("Saving changes ...");
if (ctx.ChangeTracker.HasChanges())
{
    await ctx.SaveChangesAsync();
}
Logger.InfoFormat("Saving changes ... Done.");

The code works fine.该代码工作正常。 However, we will need to use this code to run for past several months.但是,我们将需要使用此代码运行过去几个月。 The issue is: the code runs slow since for each row it will need to check whether it exists already.问题是:代码运行缓慢,因为对于每一行,它都需要检查它是否已经存在。

Is there any suggestions to improve the performance?有什么建议可以提高性能吗?

Thanks谢谢

You don't show the code on when the context is created or the life-cycle.您不会在创建上下文或生命周期时显示代码。 I'm inclined to point you to your indexes on the table.我倾向于将您指向表中的索引。 If these aren't primary keys then you might see the performance issue there.如果这些不是主键,那么您可能会在那里看到性能问题。 If you are doing full table scans, it will be progressively slower.如果您正在执行全表扫描,它会逐渐变慢。 With that said, there are two separate ways to handle the话虽如此,有两种不同的方法来处理

The EF Native way: You can explicitly create a new connection on each interaction (avoiding change tracking for all entries reducing progressive slowdown). EF Native 方式:您可以在每次交互时显式创建一个新连接(避免对所有条目进行更改跟踪,从而减少渐进式减速)。 Also, your save is async but your *Any statement is sync.此外,您的保存是异步的,但您的 *Any 语句是同步的。 Using async for that as well might help take some pressure off the current thread if it's waiting.如果当前线程正在等待,使用 async 也可能有助于减轻当前线程的压力。

// Start your context scope closer to the data call, as if the look is long 
// running you could be building up tracked changes in the cache, this prevents 
// that situation.
using (YourEntity ctx = new YourEntity())
{
    CallData data = new CallData();
    if (await ctx.CallDatas.Where(x => x.col1 == data.col1
        && x.col2 == data.col2
        && x.col3 == data.col3
        && x.col4 == data.col4
        && x.col5 == data.col5
        && x.col6 == data.col6
        ).AnyAsync()
        )
    { 
        // exists in database, skip
        // log info
    }
    else
    {
        string key = $"{data.col1}|{data.col2}|{data.col3}|{data.col4}|{data.col5}|{data.col6}";
        // check whether in current chunk already
        if (dic.ContainsKey(key))
        {
            // in current chunk, skip
            // log info
        }
        else
        {
            // insert
            ctx.CallDatas.Add(data);
            await ctx.SaveChangesAsync();
            // update dic
            dic.Add(key, true);
        }
    }
}

Optional Way: Look into inserting the data using a bulk operation via store procedure.可选方式:研究通过存储过程使用批量操作插入数据。 20k rows is trivial, and you can still use entity framework for that as well. 20k 行是微不足道的,您仍然可以为此使用实体框架。 See https://stackoverflow.com/a/9837927/1558178https://stackoverflow.com/a/9837927/1558178

I have created my own version of this (customized for my specific needs) and have found that it works well and give more control for bulk inserts.我已经创建了我自己的版本(针对我的特定需求进行了定制),并且发现它运行良好并且可以更好地控制批量插入。

I have used this ideology to insert 100k records at a time.我已经使用这种意识形态一次插入 10 万条记录。 I have my logic in the stored procedure for checking for duplicates which gives me better control as well as reducing the over the wire call to 0 reads and 1 write.我在存储过程中有用于检查重复项的逻辑,这使我可以更好地控制并将在线调用减少到 0 次读取和 1 次写入。 This should just take a second or two to execute assuming your stored procedure is optimized.假设您的存储过程已优化,这应该只需要一两秒钟即可执行。

Different approach:不同的做法:

Save all rows with duplicates - should be very efficient保存所有重复的行 - 应该非常有效

When you use data from the table use DISTINCT for all fields.当您使用表中的数据时,对所有字段使用 DISTINCT。

For raw, bulk operations like this I would consider avoiding EF entities and context tracking and merely execute SQL through the context:对于像这样的原始批量操作,我会考虑避免 EF 实体和上下文跟踪,而仅通过上下文执行 SQL:

var sql = $"IF NOT EXISTS(SELECT 1 FROM CallDates WHERE Col1={data.Col1} AND Col2={data.Col2} AND Col3={data.Col3} AND Col4={data.Col4} AND Col5={data.Col5} AND Col6={data.Col6}) INSERT INTO CallDates(Col1,Col2,Col3,Col4,Col5,Col6) VALUES ({data.Col1},{data.Col2},{data.Col3},{data.Col4},{data.Col5},{data.Col6})";
context.Database.ExeculeSqlCommand(sql);

This does without the extra checks and logging, just effectively raw SQL with duplicate detection.这不需要额外的检查和日志记录,只是有效地进行重复检测的原始 SQL。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM