简体   繁体   English

改进 EFCore 查询以实现快速操作

[英]Improve EFCore Query for fast operation

I read a sqlite database (The size of the database is about 3 MB, So there is not much information, each table is about 1 or 2 thousand rows) and extract information from it, Then I add this information to a new database.我读了一个 sqlite 数据库(数据库大小约为 3 MB,所以信息不多,每个表大约 1 或 2 千行)并从中提取信息,然后将这些信息添加到新数据库中。 The whole operation takes about 40 seconds.整个操作大约需要 40 秒。

How can I reduce this time and get the operation done as quickly as possible?我怎样才能减少这段时间并尽快完成手术? (Task, Parallel, async,...) (任务,并行,异步,...)

I am currently using this code:我目前正在使用此代码:

await Task.Run(async () =>
            {
                var pkgs = new ManifestTable();
                var mydb = new dbContext();
                await mydb.Database.EnsureDeletedAsync();
                await mydb.Database.EnsureCreatedAsync();
                using (var msixDB = new MSIXContext())
                {
                    foreach (var item in await msixDB.IdsMSIXTable.ToListAsync())
                    {
                        var rowId = item.rowid;
                        var manifests = await msixDB.Set<ManifestMSIXTable>().Where((e) => e.id == rowId).ToListAsync();

                        foreach (var manifest in manifests)
                        {
                            pkgs = new ManifestTable();
                            pkgs.PackageId = item.id;


                            var productMap = await msixDB.ProductCodesMapMSIXTable.FirstOrDefaultAsync((e) => e.manifest == manifest.rowid);
                            if (productMap != null)
                            {
                                var prdCode = await msixDB.ProductCodesMSIXTable.FirstOrDefaultAsync((e) => e.rowid == productMap.productcode);
                                if (prdCode != null)
                                {
                                    pkgs.ProductCode = prdCode.productcode;
                                }
                            }
                            var publisherMap = await msixDB.Set<PublishersMapMSIXTable>().FirstOrDefaultAsync((e) => e.manifest == manifest.rowid);

                            if (publisherMap != null)
                            {
                                var publisher = await msixDB.PublishersMSIXTable.FirstOrDefaultAsync((e) => e.rowid == publisherMap.norm_publisher);

                                if (publisher != null)
                                {
                                    pkgs.Publisher = publisher.norm_publisher;
                                }
                            }

                            var pathPart = manifest.pathpart;
                            var yml = await msixDB.PathPartsMSIXTable.FirstOrDefaultAsync((e) => e.rowid == pathPart);
                            if (yml != null)
                            {
                                pkgs.YamlName = yml.pathpart;
                            }

                            var version = await msixDB.VersionsMSIXTable.FirstOrDefaultAsync((e) => e.rowid == manifest.version);
                            if (version != null)
                            {
                                pkgs.Version = version.version;
                            }
                            await mydb.ManifestTable.AddAsync(pkgs);
                        }
                    }
                     await mydb.SaveChangesAsync();
                }

            });

Treating database as object storage is worst idea ever.将数据库视为 object 存储是有史以来最糟糕的主意。 You have to reduce database roundtrips as it is possible.您必须尽可能减少数据库往返。 In your case - by just one request.在您的情况下 - 只需一个请求。 Also do not play with Task.Run, Parallel, etc. if you do not know which part is slow.如果您不知道哪个部分慢,也不要使用 Task.Run、Parallel 等。 In your case - database roundtrips.在您的情况下 - 数据库往返。

var mydb = new dbContext();
await mydb.Database.EnsureDeletedAsync();
await mydb.Database.EnsureCreatedAsync();

using (var msixDB = new MSIXContext())
{
    var query = 
        from item in msixDB.IdsMSIXTable
        from manifest in msixDB.Set<ManifestMSIXTable>().Where(e => e.id == item.rowId)
        from productMap in msixDB.ProductCodesMapMSIXTable.Where(e => e.manifest == manifest.rowid).Take(1).DefaultIfEmpty()
        from prdCode in msixDB.ProductCodesMSIXTable.Where(e => e.rowid == productMap.productcode).Take(1).DefaultIfEmpty();
        from publisherMap in msixDB.Set<PublishersMapMSIXTable>().Where(e => e.manifest == manifest.rowid).Take(1).DefaultIfEmpty()
        from publisher in msixDB.PublishersMSIXTable.Where(e => e.rowid == publisherMap.norm_publisher).Take(1).DefaultIfEmpty()
        from yml in msixDB.PathPartsMSIXTable.Where(e => e.rowid == manifest.pathpart).Take(1).DefaultIfEmpty()
        from version in msixDB.VersionsMSIXTable.Where(e => e.rowid == manifest.version).Take(1).DefaultIfEmpty()
        select new ManifestTable
        {
            PackageId = item.id,
            ProductCode = prdCode.productcode,
            Publisher = publisher.norm_publisher,
            YamlName = yml.pathpart,
            Version = version.version
        };

    mydb.ManifestTable.AddRange(await query.ToListAsync());
    await mydb.SaveChangesAsync();
}

You should start by seeing if there are any algorithmic improvements before trying to do things in parallel etc.您应该首先查看是否有任何算法改进,然后再尝试并行执行等操作。

You have two nested loops, so if each table have a few thousands of rows the inner loop body will be running on the magnitude of 10^6, not terrible, but a fair amount.您有两个嵌套循环,因此如果每个表有几千行,则内部循环主体将以 10^6 的幅度运行,这并不可怕,但数量相当可观。

In the inner loop you are then running a whole bunch of FirstOrDefaultAsync statements.然后在内部循环中运行一大堆FirstOrDefaultAsync语句。 If these are not indexed it will require all rows to be scanned, and this will be slow.如果这些没有索引,则需要扫描所有行,这会很慢。 So, to start of ensure you have appropriate indices for all the tables.因此,首先要确保所有表都有适当的索引。 This is done on to ensure that searching for a specific item is in constant time.这样做是为了确保在恒定时间内搜索特定项目。

You also seem to be doing repeated lookups for PublishersMapMSIXTable with the same parameters.您似乎也在使用相同的参数重复查找PublishersMapMSIXTable Avoiding unnecessarily repeated operations should be one of the first things to fix, since it is just wasted cycles.避免不必要的重复操作应该是首先要解决的问题之一,因为这只是浪费周期。

If the whole operation is run on a background thread it is unlikely that all the async calls will help much, it will save a little bit of memory, but cause some bouncing between threads.如果整个操作在后台线程上运行,所有异步调用不太可能有太大帮助,它会节省一点 memory,但会导致线程之间出现一些反弹。 So if performance if important regular synchronous methods will probably be a little bit faster.因此,如果重要的常规同步方法的性能可能会快一点。

And as always with regards to performance, measure .和往常一样,在性能方面,衡量 A good performance profiler should tell you with what most of the time is spent in, and adding some stopwatches is easy if you do not have one.一个好的性能分析器应该告诉你大部分时间都花在了哪些地方,如果你没有秒表,添加一些秒表很容易。 Even very experienced programmers can be completely wrong if they try to guess what the slow parts are.如果他们试图猜测慢的部分是什么,即使是非常有经验的程序员也可能完全错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM