简体   繁体   English

JSON数组到实体框架核心很慢?

[英]JSON Array to Entity Framework Core VERY Slow?

I'm working on a utility to read through a JSON file I've been given and to transform it into SQL Server. 我正在研究一个实用程序来读取我已经给出的JSON文件并将其转换为SQL Server。 My weapon of choice is a .NET Core Console App (I'm trying to do all of my new work with .NET Core unless there is a compelling reason not to). 我选择的武器是.NET Core Console应用程序(我正在尝试使用.NET Core完成我的所有新工作,除非有令人信服的理由不这样做)。 I have the whole thing "working" but there is clearly a problem somewhere because the performance is truly horrifying almost to the point of being unusable. 我把整个事情“工作”但是某个地方显然存在问题,因为性能几乎到了无法使用的程度。

The JSON file is approximately 27MB and contains a main array of 214 elements and each of those contains a couple of fields along with an array of from 150-350 records (that array has several fields and potentially a small <5 record array or two). JSON文件大约为27MB,包含一个包含214个元素的主数组,每个元素包含几个字段以及150-350条记录的数组(该数组有几个字段,可能还有一个小的<5个记录数组或两个) 。 Total records are approximately 35,000. 总记录约为35,000。

In the code below I've changed some names and stripped out a few of the fields to keep it more readable but all of the logic and code that does actual work is unchanged. 在下面的代码中,我更改了一些名称并删除了一些字段以使其更具可读性,但实际工作的所有逻辑和代码都保持不变。

Keep in mind, I've done a lot of testing with the placement and number of calls to SaveChanges() think initially that number of trips to the Db was the problem. 请记住,我已经完成了大量的测试,调用了SaveChanges()的调用次数,最初认为到Db的次数是问题所在。 Although the version below is calling SaveChanges() once for each iteration of the 214-record loop, I've tried moving it outside of the entire looping structure and there is no discernible change in performance. 虽然下面的版本为214记录循环的每次迭代调用一次SaveChanges(),但我尝试将其移出整个循环结构,并且没有明显的性能变化。 In other words, with zero trips to the Db, this is still SLOW. 换句话说,如果没有前往Db,这仍然是缓慢的。 How slow you ask, how does > 24 hours to run hit you? 你问的速度有多慢,24小时跑怎么样打你? I'm willing to try anything at this point and am even considering moving the whole process into SQL Server but would much reather work in C# than TSQL. 我愿意在这一点上尝试任何事情,甚至考虑将整个过程转移到SQL Server中,但是在C#中工作比在TSQL中更多。

static void Main(string[] args)
{
    string statusMsg = String.Empty;

    JArray sets = JArray.Parse(File.ReadAllText(@"C:\Users\Public\Downloads\ImportFile.json"));
    try
    {
        using (var _db = new WidgetDb())
        {
            for (int s = 0; s < sets.Count; s++)
            {
                Console.WriteLine($"{s.ToString()}: {sets[s]["name"]}");

                // First we create the Set
                Set eSet = new Set()
                {
                    SetCode = (string)sets[s]["code"],
                    SetName = (string)sets[s]["name"],
                    Type = (string)sets[s]["type"],
                    Block = (string)sets[s]["block"] ?? ""
                };
                _db.Entry(eSet).State = Microsoft.EntityFrameworkCore.EntityState.Added;

                JArray widgets = sets[s]["widgets"].ToObject<JArray>();
                for (int c = 0; c < widgets.Count; c++)
                {
                    Widget eWidget = new Widget()
                    {
                        WidgetId = (string)widgets[c]["id"],
                        Layout = (string)widgets[c]["layout"] ?? "",
                        WidgetName = (string)widgets[c]["name"],
                        WidgetNames = "",
                        ReleaseDate = releaseDate,
                        SetCode = (string)sets[s]["code"]
                    };

                    // WidgetColors
                    if (widgets[c]["colors"] != null)
                    {
                        JArray widgetColors = widgets[c]["colors"].ToObject<JArray>();

                        for (int cc = 0; cc < widgetColors.Count; cc++)
                        {
                            WidgetColor eWidgetColor = new WidgetColor()
                            {
                                WidgetId = eWidget.WidgetId,
                                Color = (string)widgets[c]["colors"][cc]
                            };
                            _db.Entry(eWidgetColor).State = Microsoft.EntityFrameworkCore.EntityState.Added;
                        }
                    }

                    // WidgetTypes
                    if (widgets[c]["types"] != null)
                    {
                        JArray widgetTypes = widgets[c]["types"].ToObject<JArray>();

                        for (int ct = 0; ct < widgetTypes.Count; ct++)
                        {
                            WidgetType eWidgetType = new WidgetType()
                            {
                                WidgetId = eWidget.WidgetId,
                                Type = (string)widgets[c]["types"][ct]
                            };
                            _db.Entry(eWidgetType).State = Microsoft.EntityFrameworkCore.EntityState.Added;
                        }
                    }

                    // WidgetVariations
                    if (widgets[c]["variations"] != null)
                    {
                        JArray widgetVariations = widgets[c]["variations"].ToObject<JArray>();

                        for (int cv = 0; cv < widgetVariations.Count; cv++)
                        {
                            WidgetVariation eWidgetVariation = new WidgetVariation()
                            {
                                WidgetId = eWidget.WidgetId,
                                Variation = (string)widgets[c]["variations"][cv]
                            };
                            _db.Entry(eWidgetVariation).State = Microsoft.EntityFrameworkCore.EntityState.Added;
                        }
                    }
                }
                _db.SaveChanges();
            }
        }

        statusMsg = "Import Complete";
    }
    catch (Exception ex)
    {
        statusMsg = ex.Message + " (" + ex.InnerException + ")";
    }

    Console.WriteLine(statusMsg);
    Console.ReadKey();
} 

I had an issue with that kind of code, lots of loops and tons of changing state. 我遇到了这种代码的问题,很多循环和大量的变化状态。

Any change / manipulation you make in _db context, will generate a "trace" of it. 您在_db上下文中进行的任何更改/操作都将生成它的“跟踪”。 And it making your context slower each time. 它每次使你的上下文变慢。 Read more here . 在这里阅读更多。

The fix for me was to create new EF context(_db) at some key points. 对我来说,修复是在某些关键点创建新的EF上下文(_db)。 It saved me a few hours per run! 每次运行它节省了我几个小时!

You could try to create a new instance of _db each iteration in this loop 您可以尝试在此循环中的每次迭代中创建_db的新实例

contains a main array of 214 elements 包含214个元素的主数组

If it make no change, try to add some stopwatch to get a best idea of what/where is taking so long. 如果它没有变化,请尝试添加一些秒表 ,以便最好地了解花费的时间/地点。

If you're making thousands of updates then EF is not really the way to go. 如果您正在进行数千次更新,那么EF并不是真正的方法。 Something like SQLBulkCopy will do the trick. 像SQLBulkCopy这样的东西可以解决问题。

You could try the bulkwriter library. 您可以尝试使用bulkwriter库。

IEnumerable<string> ReadFile(string path)  
{
using (var stream = File.OpenRead(path))
     using (var reader = new StreamReader(stream))
    {
        while (reader.Peek() >= 0)
       {
              yield return reader.ReadLine();
       }
    }
}

var items =  
    from line in ReadFile(@"C:\products.csv")
    let values = line.Split(',')
    select new Product {Sku = values[0], Name = values[1]};

then 然后

using (var bulkWriter = new BulkWriter<Product>(connectionString)) {  
     bulkWriter.WriteToDatabase(items);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM