简体   繁体   中英

how to efficiently load data with many-to-many relationships in EF?

I have the following model:

public class Person
{
    public int Id { get; set; }
    public virtual ICollection<Category> Categories { get; set; }
}

public class Category
{
    public int Id { get; set; }
    public string Name { get; set; }
}

basically one person can belong to multiple categories. That code resulted in CategoryPerson { PersonId, CategoryID } table created by EF. Now I want to display all persons with all their categories in the list. Naive approach:

var people = context.People.ToList();
foreach (var p in people)
{
    Console.WriteLine("Person {0}, categories: {1}", p.Id, string.Join("|", p.Categories.Select(x => x.Name)));
}

results in 1 + N requests to database.

If I use Include as follows:

var people = context.People.Include(x => x.Categories).ToList();
foreach (var p in people)
{
    Console.WriteLine("Person {0}, categories: {1}", p.Id, string.Join("|", p.Categories.Select(x => x.Name)));
}

I do get only 1 request, however it's a join of 2 tables, and if Person record is heavy and there are multiple associated categories, then same heavy data will be returned multiple times:

{ person1, category1 } 
{ person1, category2 } 
{ person1, category3 } 

etc.

Ideally I want 2 requests to database to be made - one to get all categories, another one to get all persons. And then, ideally, these 2 arrays should be joined in memory - so when I enumerate Person.Categories it will NOT go to database but instead will take preloaded data. Can this be achieved with EF?

EF won't be able to do this for you. But it will expect/create a foreign key like Person_Id on Category in the table's schema. If you add this to Category then you can do the join in memory:

public class Person
{
    public int Id { get; set; }
    public virtual ICollection<Category> Categories { get; set; }
}

public class Category
{
    public int Id { get; set; }
    public int Person_Id { get; set; }
    public string Name { get; set; }
}

var people = context.People.ToList();
var categories = context.Categories.ToList();

foreach (var p in people)
{
    p.Categories = categories.Where(a => a.Person_Id == a.Id);
}

First, I firmly recommend to include foreign key in your model. It is a recommended and good practice to avoid Blind navigations. You need to include PersonId in class of Category related the entities.

Second, EF 5.0 (I'm not sure about older versions) supports loading a DBSet<T> entirely into the context via Load method. After filling a DBSet, you can use Local property to designate that you want in-memory entities.

context.People.Load();
context.Categories.Load();

var q = (from p in context.People.Local
        join c in context.Categories.Local
        on a.PersonId equals c.PersonId
        select p
        ).ToList(); //--> No round trip to DataBase

Your idea would work for one-to-many (or one-to-one) relationships because they have a foreign key in one of the tables and EF would load this FK (no matter if you expose it as model property or not). EF is then able to rebuild the object graph in memory based on the PKs and the loaded FKs (that's called "relationship fixup").

However it doesn't work for many-to-many relationships because neither the Person nor the Category table has a foreign key to the other table. The FKs are in the link table CategoryPerson . No column from this table gets loaded when you just load "flat" data from the Person and Category table without the related data. There is simply no information in memory after loading those data that could tell EF which Person belongs to which Categories and vice versa.

To create the correct relationships in memory you would have to load the link table as the third table....

var linkRecords = context.People.SelectMany(p => p.Categories.Select(c => new
{
    PersonId = p.Id,
    CategoryId = c.Id
}))
.ToList();

(I believe this is a relatively cheap SQL query that only feches data from the link table without any join)

...and then build the navigation collections manually in memory based on the linkRecords and the PKs of the already loaded Person and Category entities. EF won't help here because the link table records are not entities. linkRecords is just an "ad-hoc" collection of objects in memory that holds pairs of keys and EF doesn't have any metadata about the underlying type of this collection.

The whole procedure might be more efficient for not too large tables - or it might not. I really could not tell without measurements.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM