简体   繁体   中英

Best way to save a static list using SQL Server, EF6

I'm starting a rewrite of an existing application. One of the current performance bottle necks is saving user generated lists of Employees into a static list, so that they can come back and view the same list of employees at a later date.

Below is a simple example of the funcionality I am looking for. The list generated would be by a more complex query than in the example.

Scenario: A user searches for all Employees on night shift and wants to save this list to load it later. They want the list to always return the result as it was the first time they ran the search. Ie if a new employee is added to the night shift they should not appear on the list when they pull it up.


What I have tried:

Currently, there is a very poor solution of storing all of the ID's in the resulting list as a string array and then rebuilding a query using those ID's. This is very inefficient and causes problems with large lists having too many parameters.

I've also played around with constructing a table from a saved array of ID's and then joining, however, this is extremely slow on large lists (20,000+ employees) and often results in timeouts.

My current thought is to create a new table for each list and then call a JOIN on that table and the Employee table. However, if 100 users save 10 large lists (20,000+ employees) each, it quickly become a storage and table management nightmare.

I assume that this is a fairly common problem with a solution. But, I haven't been able to find any sort of examples or best practices on how to store static lists (I'm probably searching for the wrong thing). Does anyone have any general concepts on how to best handle this type of use case scenario?

Update: I think I tried the following setup before, but it wasn't working for some reason or another, this was years ago; but looking back it seems to make the most sense. I think the issue I had was with NHibernate having issues with subselects in Linq. But, that is no longer a limitation.

I'm thinking I have a table of StaticSavedLists and index table linking the Person (Employee in previous example) linking the List to Employee in a many-to-many mapping. The classes in c# would look like this:

public class StaticSavedList : BaseModel
{
    public string Name { get; set; }
    public IList<StaticSavedListPersonIdx> PersonsIdx { get; set; } //Has many persons
}

public class StaticSavedListPersonIdx : BaseModel
{
    public StaticSavedList StaticSavedList { get; set; }
    public Person Person { get; set; }
}

You probably need to have 1 table containing the "header" detail for the searches, which will include an ID for the search and then a second table with the entries for every search. Then you just need to get the ID somehow (maybe by userid and date of shift) and use that to join the results and employees table.

Here's how a schema could look

ShiftSearches

SearchID    int (PK)
ShiftDate   datetime

SearchResults

SearchID    int (PK)
EmployeeID  int (PK)

Employees

EmployeeID  int (PK)
FirstName   varchar
etc ...

Possible LINQ Query

DateTime shiftDate = new DateTime(2014,11,26);
int searchId = db.ShiftSearches.Single(s => s.ShiftDate == shiftDate).SearchID;

var results = from r in db.SearchResults where r.SearchID == searchId
    join e in Employees on r.EmployeeID equals e.EmployeeID
    select e;

This way you only need one table, and it is very thin, so should not take up much room - as you are just storing the required search and employee IDs, you can't really get the data smaller anyway.

The class structure you posted pretty much matches this concept.

Entity Framework may not be the appropriate choice of technology in all cases, and dealing with batches of 20k rows at a time might be one of these cases.

However, I believe your design of the data model is a good one. Below, on Sql Only, it can be shown that 60k rows containing (ListId, EmployeeId) pairs can be inserted into a sensibly clustered ListSearchEmployee table in under one second, and subsequently, one of the lists of 20k rows can be joined back to the full Employee row within 1.8 seconds from a cold start.

The performance bottleneck is more likely to be the original user search - presumably this can be a nearly arbitrary query executed against your Employees + related tables, which will be difficult to index for all permutations.

Some performance suggestions (for the List save + refetch):

  • Use SqlBulkCopy to bulk dump EmployeeId's into the List tables
  • Use something low level, like a basic SqlReader to fetch the list data back for the UI (although I guess it will be paged? - If so, a DbSet.SqlQuery with AsNoTracking() turned off might suffice.
  • Don't bother with Foreign Keys on the List table - this will slow down inserts.
  • Don't try and synchronously cleanup unwanted old List searches - queue them for deletion and have a background process to do the deletion.
  • As a result, the ListSearchEmployee table will frequently need reindexing due to the large amount of churn on it.
-- Sample data setup - not considered in the timing
CREATE TABLE Employee
(
  EmployeeID INT identity(1,1),
  Name NVARCHAR(100),
  SomeOtherFieldToLessenTheDensityOfEmployee CHAR(500),

  PRIMARY KEY(EmployeeID)
);

CREATE TABLE ListSearch
(
  ListSearchID INT IDENTITY(1,1) PRIMARY KEY
  -- Other fields you may want to identify the search, e.g. date, which user, which filters etc
)

CREATE TABLE ListSearchEmployee
(
  ListSearchID INT,
  EmployeeID INT, -- Don't bother Foreign Keying for performance

  PRIMARY KEY CLUSTERED (ListSearchID, EmployeeID)
);

-- Insert 1M Employees
WITH cteData AS
(
   SELECT top 1000000 sc1.name, ROW_NUMBER() OVER (ORDER BY sc1.object_id) AS rn
   FROM sys.columns sc1 CROSS JOIN sys.columns sc2 CROSS JOIN sys.columns sc3
)
INSERT INTO Employee(Name)
SELECT name + CAST(rn AS VARCHAR)
FROM cteData;

-- Timing : 0.972 seconds on SQLExpress 2012 on an i3
-- Inserting 3 x 20 k lists of pseudo random employees (but not contigious on the EmployeeId Cluster)
WITH cteData AS
(
   SELECT top 20000 1 as listid, ROW_NUMBER()  OVER (ORDER BY sc1.object_id) * 50 AS empid
   FROM sys.columns sc1 CROSS JOIN sys.columns sc2

  UNION ALL

   SELECT top 20000 2 as listid, ROW_NUMBER()  OVER (ORDER BY sc1.object_id) * 30 AS empid
   FROM sys.columns sc1 CROSS JOIN sys.columns sc2

  UNION ALL

   SELECT top 20000 3 as listid, ROW_NUMBER()  OVER (ORDER BY sc1.object_id) * 41 AS empid
   FROM sys.columns sc1 CROSS JOIN sys.columns sc2

)
INSERT INTO ListSearchEmployee(ListSearchID, EmployeeID)
SELECT listid, empid
FROM cteData;

DBCC DROPCLEANBUFFERS;

-- Timing : 1.751 seconds on SQLExpress 2012 on an i3
-- Joining 20k rows
SELECT * 
FROM ListSearchEmployee el INNER JOIN Employee e on el.EmployeeID = e.EmployeeID
WHERE el.ListSearchID = 2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM