简体   繁体   English

使用SQL Server EF6保存静态列表的最佳方法

[英]Best way to save a static list using SQL Server, EF6

I'm starting a rewrite of an existing application. 我正在开始重写现有的应用程序。 One of the current performance bottle necks is saving user generated lists of Employees into a static list, so that they can come back and view the same list of employees at a later date. 当前的绩效瓶颈之一是将用户生成的雇员列表保存到静态列表中,以便他们以后可以回来查看相同的雇员列表。

Below is a simple example of the funcionality I am looking for. 以下是我要寻找的功能的简单示例。 The list generated would be by a more complex query than in the example. 生成的列表将比示例中的查询更为复杂。

Scenario: A user searches for all Employees on night shift and wants to save this list to load it later. 场景: 一个用户搜索所有夜班的雇员,并希望保存此列表以便以后加载。 They want the list to always return the result as it was the first time they ran the search. 他们希望列表始终返回结果,因为这是他们第一次运行搜索。 Ie if a new employee is added to the night shift they should not appear on the list when they pull it up. 即,如果将新员工添加到夜班中,则当他们拉起夜班时,他们不应出现在列表中。


What I have tried: 我尝试过的

Currently, there is a very poor solution of storing all of the ID's in the resulting list as a string array and then rebuilding a query using those ID's. 当前,存在一个非常差的解决方案,即将所有ID都存储在结果列表中作为字符串数组,然后使用这些ID重建查询。 This is very inefficient and causes problems with large lists having too many parameters. 这是非常低效的,并且会导致具有过多参数的大型列表出现问题。

I've also played around with constructing a table from a saved array of ID's and then joining, however, this is extremely slow on large lists (20,000+ employees) and often results in timeouts. 我还尝试过从一个保存的ID数组构造一个表然后再加入,但是,这在大型列表(超过20,000名员工)上非常慢,并且经常导致超时。

My current thought is to create a new table for each list and then call a JOIN on that table and the Employee table. 我当前的想法是为每个列表创建一个新表,然后在该表和Employee表上调用JOIN。 However, if 100 users save 10 large lists (20,000+ employees) each, it quickly become a storage and table management nightmare. 但是,如果100个用户每个保存10个大型列表(超过20,000名员工),那么它很快就会成为存储和表管理的噩梦。

I assume that this is a fairly common problem with a solution. 我认为这是解决方案中一个相当普遍的问题。 But, I haven't been able to find any sort of examples or best practices on how to store static lists (I'm probably searching for the wrong thing). 但是,我无法找到有关如何存储静态列表的任何示例或最佳做法(我可能正在搜索错误的内容)。 Does anyone have any general concepts on how to best handle this type of use case scenario? 是否有人对如何最好地处理这种用例场景有任何一般性的概念?

Update: I think I tried the following setup before, but it wasn't working for some reason or another, this was years ago; 更新:我想我以前尝试过以下设置,但由于几年前,它出于某种原因无法正常工作; but looking back it seems to make the most sense. 但是回头看似乎最有意义。 I think the issue I had was with NHibernate having issues with subselects in Linq. 我认为我遇到的问题是NHibernate在Linq中存在子选择问题。 But, that is no longer a limitation. 但是,这不再是限制。

I'm thinking I have a table of StaticSavedLists and index table linking the Person (Employee in previous example) linking the List to Employee in a many-to-many mapping. 我在想我有一个StaticSavedLists表和一个索引表,该表将Person(在前面的示例中为Employee)链接到在多对多映射中将该List链接到Employee。 The classes in c# would look like this: C#中的类如下所示:

public class StaticSavedList : BaseModel
{
    public string Name { get; set; }
    public IList<StaticSavedListPersonIdx> PersonsIdx { get; set; } //Has many persons
}

public class StaticSavedListPersonIdx : BaseModel
{
    public StaticSavedList StaticSavedList { get; set; }
    public Person Person { get; set; }
}

You probably need to have 1 table containing the "header" detail for the searches, which will include an ID for the search and then a second table with the entries for every search. 您可能需要有一个包含搜索“表头”详细信息的表,其中将包含搜索的ID,然后是包含每个搜索条目的第二个表。 Then you just need to get the ID somehow (maybe by userid and date of shift) and use that to join the results and employees table. 然后,您只需要以某种方式获取ID(可能是通过用户ID和轮班日期),然后使用该ID来连接结果和employee表。

Here's how a schema could look 这是架构的外观

ShiftSearches ShiftSearches

SearchID    int (PK)
ShiftDate   datetime

SearchResults 搜索结果

SearchID    int (PK)
EmployeeID  int (PK)

Employees 雇员

EmployeeID  int (PK)
FirstName   varchar
etc ...

Possible LINQ Query 可能的LINQ查询

DateTime shiftDate = new DateTime(2014,11,26);
int searchId = db.ShiftSearches.Single(s => s.ShiftDate == shiftDate).SearchID;

var results = from r in db.SearchResults where r.SearchID == searchId
    join e in Employees on r.EmployeeID equals e.EmployeeID
    select e;

This way you only need one table, and it is very thin, so should not take up much room - as you are just storing the required search and employee IDs, you can't really get the data smaller anyway. 这样,您只需要一个表,并且它非常薄,因此不应该占用太多空间-因为您仅存储所需的搜索和员工ID,所以无论如何您都无法真正减小数据量。

The class structure you posted pretty much matches this concept. 您发布的课程结构与此概念非常匹配。

Entity Framework may not be the appropriate choice of technology in all cases, and dealing with batches of 20k rows at a time might be one of these cases. 实体框架可能并非在所有情况下都是适当的技术选择,并且一次处理2万行的批处理可能是其中一种情况。

However, I believe your design of the data model is a good one. 但是,我相信您对数据模型的设计是一个不错的选择。 Below, on Sql Only, it can be shown that 60k rows containing (ListId, EmployeeId) pairs can be inserted into a sensibly clustered ListSearchEmployee table in under one second, and subsequently, one of the lists of 20k rows can be joined back to the full Employee row within 1.8 seconds from a cold start. 下面,仅在Sql上,可以显示可以在不到一秒钟的时间内将60k包含(ListId, EmployeeId)对的行插入到合理聚集的ListSearchEmployee表中,随后,可以将20k行的列表之一重新连接到从冷启动开始的1.8秒内完整的Employee行。

The performance bottleneck is more likely to be the original user search - presumably this can be a nearly arbitrary query executed against your Employees + related tables, which will be difficult to index for all permutations. 性能瓶颈很可能是原始用户搜索-大概这可能是对Employees +相关表执行的几乎任意查询,这将很难为所有排列建立索引。

Some performance suggestions (for the List save + refetch): 一些性能建议(用于列表保存+刷新):

  • Use SqlBulkCopy to bulk dump EmployeeId's into the List tables 使用SqlBulkCopy将EmployeeId批量转储到List表中
  • Use something low level, like a basic SqlReader to fetch the list data back for the UI (although I guess it will be paged? - If so, a DbSet.SqlQuery with AsNoTracking() turned off might suffice. 使用一些低级别,就像一个基本SqlReader到列表中的数据取回来的UI(虽然我想这将分页- ?如果是这样,一个DbSet.SqlQueryAsNoTracking()关闭就足够。
  • Don't bother with Foreign Keys on the List table - this will slow down inserts. 不要打扰列表表上的外键-这会减慢插入速度。
  • Don't try and synchronously cleanup unwanted old List searches - queue them for deletion and have a background process to do the deletion. 不要尝试同步清理不需要的旧列表搜索-将它们排入删除队列,并有一个后台进程来进行删除。
  • As a result, the ListSearchEmployee table will frequently need reindexing due to the large amount of churn on it. 结果,由于表搜索量很大, ListSearchEmployee表经常需要重新索引。
-- Sample data setup - not considered in the timing
CREATE TABLE Employee
(
  EmployeeID INT identity(1,1),
  Name NVARCHAR(100),
  SomeOtherFieldToLessenTheDensityOfEmployee CHAR(500),

  PRIMARY KEY(EmployeeID)
);

CREATE TABLE ListSearch
(
  ListSearchID INT IDENTITY(1,1) PRIMARY KEY
  -- Other fields you may want to identify the search, e.g. date, which user, which filters etc
)

CREATE TABLE ListSearchEmployee
(
  ListSearchID INT,
  EmployeeID INT, -- Don't bother Foreign Keying for performance

  PRIMARY KEY CLUSTERED (ListSearchID, EmployeeID)
);

-- Insert 1M Employees
WITH cteData AS
(
   SELECT top 1000000 sc1.name, ROW_NUMBER() OVER (ORDER BY sc1.object_id) AS rn
   FROM sys.columns sc1 CROSS JOIN sys.columns sc2 CROSS JOIN sys.columns sc3
)
INSERT INTO Employee(Name)
SELECT name + CAST(rn AS VARCHAR)
FROM cteData;

-- Timing : 0.972 seconds on SQLExpress 2012 on an i3
-- Inserting 3 x 20 k lists of pseudo random employees (but not contigious on the EmployeeId Cluster)
WITH cteData AS
(
   SELECT top 20000 1 as listid, ROW_NUMBER()  OVER (ORDER BY sc1.object_id) * 50 AS empid
   FROM sys.columns sc1 CROSS JOIN sys.columns sc2

  UNION ALL

   SELECT top 20000 2 as listid, ROW_NUMBER()  OVER (ORDER BY sc1.object_id) * 30 AS empid
   FROM sys.columns sc1 CROSS JOIN sys.columns sc2

  UNION ALL

   SELECT top 20000 3 as listid, ROW_NUMBER()  OVER (ORDER BY sc1.object_id) * 41 AS empid
   FROM sys.columns sc1 CROSS JOIN sys.columns sc2

)
INSERT INTO ListSearchEmployee(ListSearchID, EmployeeID)
SELECT listid, empid
FROM cteData;

DBCC DROPCLEANBUFFERS;

-- Timing : 1.751 seconds on SQLExpress 2012 on an i3
-- Joining 20k rows
SELECT * 
FROM ListSearchEmployee el INNER JOIN Employee e on el.EmployeeID = e.EmployeeID
WHERE el.ListSearchID = 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM