简体   繁体   中英

Extremely slow EF startup - 15 minutes

Some time ago I created a system, in which user can define categories with custom fileds for some objects. Then, each object has FieldValue based on its category. Classes below:

public class DbCategory
    {
        public int Id { get; set; }

        [Required]
        public string Name { get; set; }

        [Required]
        public TextDbField MainField { get; set; }
        public List<DbField> Fields { get; set; }
    }

 public class DbObject
    {
        public int Id { get; set; }
        public byte[] Bytes { get; set; }

        [Required]
        public DbCategory Category { get; set; }

        public TextDbFieldValue MainFieldValue { get; set; }
        public List<DbFieldValue> FieldsValues { get; set; }
    }

public abstract class DbField
    {
        public int Id { get; set; }

        [Required]
        public string Name { get; set; }

        [Required]
        public bool Required { get; set; }


    }


    public class IntegerDbField : DbField
    {
        public int? Minimum { get; set; }
        public int? Maximum { get; set; }
    }

    public class FloatDbField : DbField
    {
        public double? Minimum { get; set; }
        public double? Maximum { get; set; }

    }
//... few other types

  public abstract class DbFieldValue
    {
        [Key]
        public int Id { get; set; }
        [Required]
        public DbField Field { get; set; }

        [JsonIgnore]
        public abstract string Value { get; set; }
    }


    public class IntDbFieldValue : DbFieldValue
    {
        public int? IntValue { get; set; }

        public override string Value
        {
            get { return IntValue?.ToString(); }
            set
            {
                if (value == null) IntValue = null;
                else IntValue = int.Parse(value);
            }
        }
    }// and other FieldValue types

On my dev machine (i5, 16bg ram and ssd drive), database (in SqlExpress) with 4 categories, each hasving 5-6 fields, 10k records, first query takes about 15s. This first query is

var result = db.Objects
     .Include(s => s.Category)
     .Include(s => s.Category.MainField)
     .Include(s => s.MainFieldValue.Field)
     .Include(s => s.FieldsValues.Select(f => f.Field))
     .Where(predicate ?? AlwaysTrue)
     .ToArray();

I do that to load everything into memory. Then, I work on cached list and just write changes into database. I do that, because user can perform search with filter on each FieldValue. Querying database each time then proved to be much to slow - this part however works pretty well.

Problem occurs later. Some clients defined 6 categories with 20+ fields on each, and store 70k+ records, startup takes more than 15 minutes sometimes. After that, there is no difference in the speed between 5k and 50k.

Every technique to improve EF Code First startup time I've found considers mostly view creation caching, ngening EF and so on, but in this case startup time grows after adding more records, not more entities types.

I realise that that's caused by the complexity of schema, but is there some way to speed this up? Fortunately, this is Windows Service, so once it is started, it goes for weeks, but still.

Should I drop EF for the first load and do it in pure SQL? Should I do this in batches? Should I change EF to nHibernate? Or something else? On virtualized servers during execution of this line, this program maxes out the CPU (not SQL server, but my application).

I've tried loading objects only and then load their properties later. This was a bit faster (but not noticably) on small databases, but is even slower on bigger ones. Any help appreciated, even if the answer is "suck it up and wait".

I managed to reduce total start time cuased by EF 3 times with those tricks:

  1. Update framework to 6.2 and enable model caching :

    public class CachingContextConfiguration : DbConfiguration { public CachingContextConfiguration() { SetModelStore(new DefaultDbModelStore(Directory.GetCurrentDirectory())); }

    }

  2. Call ctx.Database.Initialize() explicitly from new thread, as early as possible. This still takes 3-4 seconds, but since it happens alongside other things, it helps a lot.

  3. Load entities into EF cache in reasonable order.

Previously, I just wrote Include after Inlude, which translates into multiple joins. I found a "rule of thumb" on some blog posts, that up to two chained Includes EF performs rather well, but each more slows everything down massively. I also found a blog post , that showed EF caching: once given entity was loaded with Include or Load, it will be automatically put in proper property (blog author is wrong about union of objects). So I did this:

  using (var db = new MyContext())
            {
                db.Fields.Load();
                db.Categories.Include(c => c.MainField).Include(x => x.Fields).Load();
                db.FieldValues.Load();
                return db.Objects.Include(x => x.MainFieldValue.Field).ToArray();
            } 

This is fetching data 6 times faster than includes from question. I think that once entities are previously loaded, EF engine does not call database for related objects, it just gets them from cache.

  1. I also added this in my context constructor:

      Configuration.LazyLoadingEnabled = false; Configuration.ProxyCreationEnabled = false; 

Effects of that are barely noticable, but may play bigger role on huge data set.

I've also watched this presentation of EF Core by Rowan Miller and I will be switching to it on next release - in some cases it's 5-6 times faster than EF6.

Hope this helps someone

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM