Entity framework 6 code first: what is the best implementation for a baseobject with 10 childobjects

Question

We have a baseobject with 10 childobjects and EF6 code first.

Of those 10 childobjects, 5 have only a few (extra) properties, and 5 have multiple properties (5 to 20). We implemented this as table-per-type, so we have one table for the base and 1 per child (total 10).

This, however, creates HUGE select queries with select case and unions all over the place, which also takes the EF 6 seconds to generate (the first time).

I read about this issue, and that the same issue holds in the table-per-concrete type scenario.

So what we are left with is table-per-hierachy, but that creates a table with a large number of properties, which doesn't sound great either.

Is there another solution for this?

I thought about maybe skip the inheritance and create a union view for when I want to get all the items from all the child objects/records.

Any other thoughts?

Answer 1

Another solution would be to implement some kind of CQRS pattern where you have separate databases for writing (command) and reading (query). You could even de-normalize the data in the read database so it is very fast.

Assuming you need at least one normalized model with referential integrity, I think your decision really comes down to Table per Hierarchy and Table per Type. TPH is reported by Alex James from the EF team and more recently on Microsoft's Data Development site to have better performance.

Advantages of TPT and why they're not as important as performance:

Greater flexibility, which means the ability to add types without affecting any existing table. Not too much of a concern because EF migrations make it trivial to generate the required SQL to update existing databases without affecting data.

Database validation on account of having fewer nullable fields. Not a massive concern because EF validates data according to the application model. If data is being added by other means it is not too difficult to run a background script to validate data. Also, TPT and TPC are actually worse for validation when it comes to primary keys because two sub-class tables could potentially contain the same primary key. You are left with the problem of validation by other means.

Storage space is reduced on account of not needing to store all the null fields. This is only a very trivial concern, especially if the DBMS has a good strategy for handling 'sparse' columns.

Design and gut-feel. Having one very large table does feel a bit wrong, but that is probably because most db designers have spent many hours normalizing data and drawing ERDs. Having one large table seems to go against the basic principles of database design. This is probably the biggest barrier to TPH. See this article for a particularly impassioned argument .

That article summarizes the core argument against TPH as:

It's not normalized even in a trivial sense, it makes it impossible to enforce integrity on the data, and what's most "awesome:" it is virtually guaranteed to perform badly at a large scale for any non-trivial set of data.

These are mostly wrong. Performance and integrity are mentioned above, and TPH does not necessarily mean denormalized. There are just many (nullable) foreign key columns that are self-referential. So we can go on designing and normalizing the data exactly as we would with a TPH. In a current database I have many relationships between sub-types and have created an ERD as if it were a TPT inheritance structure. This actually reflects the implementation in code-first Entity Framework. For example here is my Expenditure class, which inherits from Relationship which inherits from Content :

public class Expenditure : Relationship
{
    /// <summary>
    /// Inherits from Content: Id, Handle, Description, Parent (is context of expenditure and usually 
    /// a Project)
    /// Inherits from Relationship: Source (the Principal), SourceId, Target (the Supplier), TargetId, 
    /// 
    /// </summary>
    [Required, InverseProperty("Expenditures"), ForeignKey("ProductId")]
    public Product Product { get; set; }
    public Guid ProductId { get; set; }

    public string Unit { get; set; }
    public double Qty { get; set; }
    public string Currency { get; set; }
    public double TotalCost { get; set; }        

}

The InversePropertyAttribute and the ForeignKeyAttribute provide EF with the information required to make the required self joins in the single database.

The Product type also maps to the same table (also inheriting from Content). Each Product has its own row in the table and rows that contain Expenditures will include data in the ProductId column, which is null for rows containing all other types. So the data is normalized , just placed in a single table.

The beauty of using EF code first is we design the database in exactly the same way and we implement it in (almost) exactly the same way regardless of using TPH or TPT. To change the implementation from TPH to TPT we simply need to add an annotation to each sub-class, mapping them to new tables. So, the good news for you is it doesn't really matter which one you choose. Just build it, generate a stack of test data, test it, change strategy, test it again. I reckon you'll find TPH the winner.

Answer 2

Having experienced similar problems myself I've a few suggestions. I'm also open to improvements on these suggestions as It's a complex topic, and I don't have it all worked out.

Entity framework can be very slow when dealing with non-trivial queries on complex entities - ie those with multiple levels of child collections. In some performance tests I've tried it does sit there an awful long time compiling the query. In theory EF 5 and onwards should cache compiled queries (even if the context gets disposed and re-instantiated) without you having to do anything, but I'm not convinced that this is always the case.

I've read some suggestions that you should create multiple DataContexts with only smaller subsets of your database entities for a complex database. If this is practical for you give it a try! But I imagine there would be maintenance issues with this approach.

1) I Know this is obvious but worth saying anyway - make sure you have the right foreign keys set up in your database for related entities, as then entity framework will keep track of these relationships, and be much quicker generating queries where you need to join using the foreign key.

2) Don't retrieve more than you need. One-size fits all methods to get a complex object are rarely optimal. Say you are getting a list of base objects (to put in a list) and you only need to display the name and ID of these objects in the list of the base object. Just retrieve only the base object - any navigation properties that aren't specifically needed should not be retrieved.

3) If the child objects are not collections, or they are collections but you only need 1 item (or an aggregate value such as the count) from them I would absolutely implement a View in the database and query that instead. It is MUCH quicker. EF doesn't have to do any work - its all done in the database, which is better equipped for this type of operation.

4) Be careful with .Include() and this goes back to point #2 above. If you are getting a single object + a child collection property you are best not using .Include() as then when the child collection is retrieved this will be done as a separate query. (so not getting all the base object columns for every row in the child collection)

EDIT

Following comments here's some further thoughts.

As we are dealing with an inheritance hierarchy it makes logical sense to store separate tables for the additional properties of the inheriting classes + a table for the base class. As to how to make Entity Framework perform well though is still up for debate.

I've used EF for a similar scenario (but fewer children), (Database first), but in this case I didn't use the actual Entity framework generated classes as the business objects. The EF objects directly related to the DB tables.

I created separate business classes for the base and inheriting classes, and a set of Mappers that would convert to them. A query would look something like

public static List<BaseClass> GetAllItems()
{
  using (var db = new MyDbEntities())
  {
    var q1 = db.InheritedClass1.Include("BaseClass").ToList()
       .ConvertAll(x => (BaseClass)InheritedClass1Mapper.MapFromContext(x));
    var q2 = db.InheritedClass2.Include("BaseClass").ToList()
       .ConvertAll(x => (BaseClass)InheritedClass2Mapper.MapFromContext(x));

    return q1.Union(q2).ToList();  
  }
}

Not saying this is the best approach, but it might be a starting point? The queries are certainly quick to compile in this case!

Comments welcome!

Answer 3

With Table per Hierarchy you end up with only one table, so obviously your CRUD operations will be faster and this table is abstracted out by your domain layer anyway. The disadvantage is that you loose the ability for NOT NULL constraints, so this needs to be handled properly by your business layer to avoid potential data integrity. Also, adding or removing entities means that the table changes; but that's also something that is manageable.

With Table per type you have the problem that the more classes in the hierarchy you have, the slower your CRUD operations will become.

All in all, as performance is probably the most important consideration here and you have a lot of classes, I think Table per Hierarchy is a winner in terms of both performance and simplicity and taking into account your number of classes.

Also look at this article , more specifically at chapter 7.1.1 (Avoiding TPT in Model First or Code First applications), where they state: "when creating an application using Model First or Code First, you should avoid TPT inheritance for performance concerns."

Answer 4

The EF6 CodeFirst model I'm working on using generics and an abstract base classes called "BaseEntity". I also use generics and a base class for the EntityTypeConfiguration class.

In the event that I need to reuse a couple of properties "columns" on some tables and it doesn't make sense for them to be on BaseEntity or BaseEntityWithMetaData, I make an interface for them.

Eg I have one for addresses I haven't finished yet. So if an entity has address information it will implement IAddressInfo. Casting an entity to IAddressInfo will give me an object with just the AddressInfo on it.

Originally I had my metadata columns as their own table. But like others have mentioned, the queries were horrendous, and it was slower than slow. So I thought, why don't I just use multiple inheritance paths to support what I want to do so the columns are on every table that need them, and not on the ones that don't. Also I am using mysql which has a column limit of 4096. Sql Server 2008 has 1024. Even at 1024, I don't see realistic scenarios for going over that on one table.

And non of my objjets inherit in such a way that they have columns they don't need. When that need arises I create a new base class at a level to prevent the extra columns.

Here's are enough snippets from my code to understand how I have my inheritance setup. So far it works really well for me. I haven't really produced a scenario I couldn't model with this setup.

public BaseEntityConfig<T> : EntityTypeConfiguration<T> where T : BaseEntity<T>, new()
{
}

public BaseEntity<T> where T : BaseEntity<T>, new()
{
   //shared properties here
}

public BaseEntityMetaDataConfig : BaseEntityConfig<T> where T: BaseEntityWithMetaData<T>, new()
{
    public BaseEntityWithMetaDataConfig()
    {
        this.HasOptional(e => e.RecCreatedBy).WithMany().HasForeignKey(p => p.RecCreatedByUserId);
        this.HasOptional(e => e.RecLastModifiedBy).WithMany().HasForeignKey(p => p.RecLastModifiedByUserId);

    }
}

public BaseEntityMetaData<T> : BaseEntity<T> where T: BaseEntityWithMetaData<T>, new()
{
    #region Entity Properties
    public DateTime? DateRecCreated { get; set; }
    public DateTime? DateRecModified { get; set; }

    public long? RecCreatedByUserId { get; set; }
    public virtual User RecCreatedBy { get; set; }
    public virtual User RecLastModifiedBy { get; set; }
    public long? RecLastModifiedByUserId { get; set; }
    public DateTime? RecDateDeleted { get; set; }        
    #endregion
}



    public PersonConfig()
    {
        this.ToTable("people");
        this.HasKey(e => e.PersonId);
        this.HasOptional(e => e.User).WithRequired(p => p.Person).WillCascadeOnDelete(true);
        this.HasOptional(p => p.Employee).WithRequired(p => p.Person).WillCascadeOnDelete(true);                
        this.HasMany(e => e.EmailAddresses).WithRequired(p => p.Person).WillCascadeOnDelete(true);


        this.Property(e => e.FirstName).IsRequired().HasMaxLength(128);
        this.Property(e => e.MiddleName).IsOptional().HasMaxLength(128);
        this.Property(e => e.LastName).IsRequired().HasMaxLength(128);


    }
}

//I Have to use this pattern to allow other classes to inherit from person, they have to inherit from BasePeron<T>
public class Person : BasePerson<Person>
{
    //Just a dummy class to expose BasePerson as it is.
}

public class BasePerson<T> : BaseEntityWithMetaData<T> where T: BasePerson<T>, new()
{
    #region Entity Properties       
    public long PersonId { get; set; } 
    public virtual User User { get; set; }

    public string FirstName { get; set; }

    public string MiddleName { get; set; }

    public string LastName { get; set; }

    public virtual Employee Employee { get; set; }

    public virtual ICollection<PersonEmail> EmailAddresses { get; set; }
    #endregion

    #region Entity Helper Properties
    [NotMapped]
    public PersonEmail PrimaryPersonalEmail
    {
        get
        {
            PersonEmail ret = null;
            if (this.EmailAddresses != null)
                ret = (from e in this.EmailAddresses where e.EmailAddressType == EmailAddressType.Personal_Primary select e).FirstOrDefault();
            return ret;
        }
    }
    [NotMapped]
    public PersonEmail PrimaryWorkEmail
    {
        get
        {
            PersonEmail ret = null;
            if (this.EmailAddresses != null)
                ret = (from e in this.EmailAddresses where e.EmailAddressType == EmailAddressType.Work_Primary select e).FirstOrDefault();
            return ret;
        }
    }

    private string _DefaultEmailAddress = null;
    [NotMapped]
    public string DefaultEmailAddress
    {
        get
        {
            if (string.IsNullOrEmpty(_DefaultEmailAddress))
            {
                PersonEmail personalEmail = this.PrimaryPersonalEmail;
                if (personalEmail != null && !string.IsNullOrEmpty(personalEmail.EmailAddress))
                    _DefaultEmailAddress = personalEmail.EmailAddress;
                else
                {
                    PersonEmail workEmail = this.PrimaryWorkEmail;
                    if (workEmail != null && !string.IsNullOrEmpty(workEmail.EmailAddress))
                        _DefaultEmailAddress = workEmail.EmailAddress;
                }
            }
            return _DefaultEmailAddress;
        }
    }

    #endregion

    #region Constructor
    static BasePerson()
    {            
    }
    public BasePerson()
    {
        this.User = null;
        this.EmailAddresses = new HashSet<PersonEmail>();
    }
    public BasePerson(string firstName, string lastName)
    {
        this.FirstName = firstName;
        this.LastName = lastName;
    }
    #endregion

}

Now, code in the context on ModelCreating looks like,

        //Config
        modelBuilder.Conventions.Remove<PluralizingTableNameConvention>();

        //initialize configuration, each line is responsible for telling entity framework how to create relation ships between the different tables in the database.
        //Such as Table Names, Foreign Key Contraints, Unique Contraints, all relations etc.
        modelBuilder.Configurations.Add(new PersonConfig());
        modelBuilder.Configurations.Add(new PersonEmailConfig());
        modelBuilder.Configurations.Add(new UserConfig());
        modelBuilder.Configurations.Add(new LoginSessionConfig());
        modelBuilder.Configurations.Add(new AccountConfig());
        modelBuilder.Configurations.Add(new EmployeeConfig());
        modelBuilder.Configurations.Add(new ContactConfig());
        modelBuilder.Configurations.Add(new ConfigEntryCategoryConfig());
        modelBuilder.Configurations.Add(new ConfigEntryConfig());
        modelBuilder.Configurations.Add(new SecurityQuestionConfig());
        modelBuilder.Configurations.Add(new SecurityQuestionAnswerConfig());

The reason I created base classes for the Configuration of my entities was because when I started down this path I ran into an annoying problem. I had to configure the shared properties for every derrived class over and over again. And if I updated one of the fluent API mappings, I had to update code in every derrived class.

But by using this inheritance method on the configuration classes the two properties are configured in one place, and inherited by the configuration class for derrived entities.

So when PeopleConfig is configured, it runs the logic on the BaseEntityWithMetaData class to configure the two properties, and again when UserConfig runs, etc etc etc.

Answer 5

Three different approaches have different names in M. Fowler's language:

Single Table inheritance - whole inheritance hierarchy held in one table. No joins, optional columns for child types. You need to distinguish which child type it is.
Concrete Table inheritance - you have one table for each concrete type. Joins, no optional columns. In this case, base type table is needed only if the base type requires to have its own mapping (instance can be created).
Class Table inheritance - you have base type table, and child tables - each adding only additional columns to the base's columns. Joins, no optional columns. In this case, base type table always contains row for each child; however, you can retrieve common columns only if no child-specific columns are needed (rest comes with lazy loading maybe?).

All approaches are workable - it only depends on the amount and structure of data you have, so you can measure performance differences first.

Choice will be based on the number of joins vs. data distribution vs. optional columns.

If you don't have (and not going to have) many child types, I would go with class table inheritance since that stands close to the domain and will be easy to translate/map.
If you have many child tables to work with at the same time, and anticipate bottleneck in joins - go with single table inheritance.
If joins are not needed at all and you are going to work with one concrete type at a time - go with concrete table inheritance.

Answer 6

Although, the Table per Hierarchy (TPH) is a better approach for fast CRUD operations, yet in that case it is impossible to avoid a single table with a so many properties for the database created. The case and union clauses that you mentioned are created because the resulting query is effectively requesting a polymorphic result set that includes multiple types.

However, when EF returns flattened table that includes the data for all the types, it does extra work to ensure that, null values are returned for columns that may be irrelevant for a particular type. Technically, this extra validation using case and union is not necessary The below issue is a performance glitch in Microsoft EF6 and they are are aiming to deliver this fix in a future release.

The below query:

 SELECT
[Extent1].[CustomerId] AS [CustomerId],
[Extent1].[Name] AS [Name],
[Extent1].[Address] AS [Address],
[Extent1].[City] AS [City],
CASE WHEN (( NOT (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL))) AND ( NOT(([UnionAll1].[C4] = 1) AND ([UnionAll1].[C4] IS NOT NULL)))) THEN CAST(NULL ASvarchar(1)) WHEN (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL)) THEN[UnionAll1].[State] END AS [C2],
CASE WHEN (( NOT (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL))) AND ( NOT(([UnionAll1].[C4] = 1) AND ([UnionAll1].[C4] IS NOT NULL)))) THEN CAST(NULL ASvarchar(1)) WHEN (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL))THEN[UnionAll1].[Zip] END AS [C3],
FROM  [dbo].[Customers] AS [Extent1]

can be safely replaced by:

SELECT
[Extent1].[CustomerId] AS [CustomerId],
[Extent1].[Name] AS [Name],
[Extent1].[Address] AS [Address],
[Extent1].[City] AS [City],
 [UnionAll1].[State]  AS [C2],
 [UnionAll1].[Zip]  AS [C3],
FROM  [dbo].[Customers] AS [Extent1]

So, you just saw the problem and the flaw of Entity Framework 6 current release, you have an option to either use a Model First Approach or use a TPH approach.

Entity framework 6 code first: what is the best implementation for a baseobject with 10 childobjects

Question

6 answers

solution1
8 ACCPTED 2014-07-03 02:15:51

solution2
4 2014-06-27 11:38:02

solution3
4 2014-07-01 11:29:05

solution4
3 2014-07-07 18:36:39

solution5
2 2014-07-07 17:46:43

solution6
2 2014-07-08 05:58:43

Entity framework 6 code first: what is the best implementation for a baseobject with 10 childobjects

Question

6 answers

solution1 8 ACCPTED 2014-07-03 02:15:51

solution2 4 2014-06-27 11:38:02

solution3 4 2014-07-01 11:29:05

solution4 3 2014-07-07 18:36:39

solution5 2 2014-07-07 17:46:43

solution6 2 2014-07-08 05:58:43

solution1
8 ACCPTED 2014-07-03 02:15:51

solution2
4 2014-06-27 11:38:02

solution3
4 2014-07-01 11:29:05

solution4
3 2014-07-07 18:36:39

solution5
2 2014-07-07 17:46:43

solution6
2 2014-07-08 05:58:43