简体   繁体   English

LINQ - 通过WHERE子句查询大约6000条唯一记录

[英]LINQ - Querying about 6000 unique records by WHERE clause

I've got a problem and no idea how to solve it. 我有一个问题,不知道如何解决它。 Imagine you have a List<int> with about 6000 unique id's that matches the id's of a table with about a million records in a sql datatable. 假设您有一个List<int>其中包含大约6000个唯一ID,它与sql数据表中大约一百万条记录的表的id相匹配。 I want to select those 6000 records that matches those id's via LINQ from my c# program. 我想从我的c#程序中选择那些通过LINQ匹配那些id的记录。 I don't want to use Contains() because it gets very slow in translation and the argumentlist gets to big. 我不想使用Contains()因为它的翻译速度非常慢而且参数列表变得很大。

Any other ideas how to solve this ? 任何其他想法如何解决这个问题?


Something about my scenario ( this is not the real but a similar scenario ) : 关于我的场景的一些事情(这不是真实的,而是类似的场景):

I've got a Service that is connected to a database. 我有一个连接到数据库的服务。 A Client requests a batch of Items like a Person for example. 例如,客户端请求一批项目,例如Person。 The Service accepts the requests, query the DB and sends the data back to the client. 服务接受请求,查询数据库并将数据发送回客户端。

Person = (PersonID , Prename, Lastname) Person =(PersonID,Prename,Lastname)

Now the Client holds a temporary List of Persons. 现在客户持有临时人员名单。 With an additional method I would like to retrieve the adresses of these Persons from the service. 使用其他方法,我想从服务中检索这些人的地址。 So I put a List of PersonID's to the service that should give me back a list of Adresses that have references to those persons. 所以我将一个PersonID列表添加到服务中,该列表应该返回一个引用这些人的地址列表。

I wouldn't recommend this. 我不推荐这个。 As great a tool as LINQ is, there are some scenarios where trying to get smart with your data by handling it code-side could be quite detrimental to the application performance. 作为LINQ的绝佳工具,在某些情况下,通过处理代码端来尝试智能处理数据可能对应用程序性能非常不利。

You've got a list of these Id's somewhere, if they are in the database, why not do the whole operation as a stored procedure and just return the results, That way you're not having to push an expensive query across the wire, it's all in your database, so you minimise traffic, and likely increase responsiveness. 你有一个这些Id的列表,如果它们在数据库中,为什么不将整个操作作为存储过程而只返回结果,这样你就不必通过线路推送昂贵的查询,它全部在您的数据库中,因此您可以最大限度地减少流量,并可能提高响应速度。

6000 items might not seem like a lot to bother with this for, but realistically as you said, it can be a bit of a nightmare of performance when trying to do a select with datasets of size. 6000个项目可能看起来不是很多,但实际上如你所说,当尝试使用大小的数据集进行选择时,这可能是一个噩梦。

Some ideas: 一些想法:

Insert the 6000 IDs into a temporary table and join that temporary table to your million record one. 将6000个ID插入临时表,并将该临时表连接到百万个记录表。

Use Contains() and select in batches of n, where N = 500, 1000, etc. instead of all 6000 at once. 使用Contains()并分批选择n,其中N = 500,1000等,而不是一次全部6000。

Using Contains() is going to make linq to create a very big SQL statement. 使用Contains()将使linq创建一个非常大的SQL语句。

If you are using Entity Framework (EF) you can use inner join between your data (data) and your table (Customers) 如果您使用的是Entity Framework(EF),则可以在数据(数据)和表(Customers)之间使用内部联接

void Main()
{    
    var data = Enumerable.Range(1, 6000);

    var result = from x in data
    join y in Customers
    on x equals y.CustomerID
    select x;

    result.Dump();
}

Typically, I've found that xml performs best for large IN criteria for IDs. 通常,我发现xml对于ID的大IN条件表现最佳。 It also gets around the 2100 max parameters in SQL Server, which you'll hit if you do a Contains in LINQ. 它还可以在SQL Server中获取2100个最大参数,如果在LINQ中执行包含,则会触及该参数。

I would suggest: 我会建议:

  • make List 制作清单
  • serialize to Xml 序列化为Xml
  • create a stored procedure called ContainsXYZ that takes the xml as a parameter 创建一个名为ContainsXYZ的存储过程,该过程将xml作为参数
  • have your stored procedure use xpath to extract the Ids and join on it 让你的存储过程使用xpath来提取ID并加入它
  • assuming you're using Entity Framework, you can map this stored procedure, execute it, then materialize the results into regular entities. 假设您正在使用实体框架,您可以映射此存储过程,执行它,然后将结果具体化为常规实体。

Just to get this out of the way... 只是为了解决这个问题......

 var joinTry = from company in dc.Companies
               join id in list on company.CompanyID equals id
               select company;

Doesn't work. 不行。 LinqToSql won't let you join. LinqToSql不会让你加入。 "Local sequence cannot be used in LINQ to SQL implementations of query operators except the Contains operator." “除了Contains运算符之外,本地序列不能用于查询运算符的LINQ to SQL实现。”

 var containsTry = from company in dc.Companies
                   where list.Contains(company.CompanyID)
                   select company;

Does work. 有用吗 In the predictable 在可预测的

SELECT [t0].[CompanyID], [t0].[CompanyName]
FROM [Company] AS [t0]
WHERE [t0].[CompanyID] IN (@p0, @p1, @p2, @p3, @p4, @p5, ...

way... As dirty as this is, there is seriously no faster way to get a list of ints to a SQL server. 方式...就像这样脏,没有更快的方法来获取SQL服务器的int列表。 The overhead time of any call is seriously larger than any parsing. 任何调用的开销时间都比任何解析都要大得多。

SELECT
    c.CompanyId,
    c.CompanyName
FROM Company c
    WHERE CompanyID IN (1,2,3,4,5,6,7,8,9,10)

...is the same execution speed as... (Generated by LINQ) ...与...的执行速度相同(由LINQ生成)

exec sp_executesql N'SELECT [t0].[CompanyID], [t0].[CompanyName]
FROM [Company] AS [t0]
WHERE [t0].[CompanyID] IN (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9)',N'@p0 int,@p1 int,@p2 int,@p3 int,@p4 int,@p5 int,@p6 int,@p7 int,@p8 int,@p9 int',@p0=1,@p1=2,@p2=3,@p3=4,@p4=5,@p5=6,@p6=7,@p7=8,@p8=9,@p9=10

...and is Twice as fast as... ......和两倍一样快......

SELECT
    c.CompanyId,
    c.CompanyName
FROM Company c
        /* @Test is a table variable with 1-10 in it */
    INNER JOIN @Test t ON t.ID = c.CompanyID

You really don't need to optimize SQL server's handling of a list of integers. 您真的不需要优化SQL服务器对整数列表的处理。 In the IN() solution, SQL puts the integers in an index it generates on the fly ANYWAY. 在IN()解决方案中,SQL将整数放在它随时生成的索引中。

The real question should be... "What and I representing with a list of 6000 integers?" 真正的问题应该是......“我用6000个整数列表代表什么?” and "Should I put this list in a table?". 并且“我应该把这个清单放在桌子上吗?”。 Any solution that takes client-side list of 6000 integers, and sends it to the server, will be >= in overhead than a solution that uses Contains(). 任何采用6000个整数的客户端列表并将其发送到服务器的解决方案将比使用Contains()的解决方案在开销中> =。 If you use LinqToSQL, you sort of have to sell out to the paradigm to a certain extent. 如果你使用LinqToSQL,你必须在某种程度上卖给范式。

If this still makes you feel dirty, You might try creating a table for arbitrary restriction lists. 如果这仍然让你感觉很脏,你可以尝试为任意限制列表创建一个表。 Two columns, both ints. 两列,都是整数。 Then you can insert your IDs into that table, then just use... 然后你可以将你的ID插入该表,然后只使用...

var searchTry = from company in dc.Companies
                join search in dc.SearchLists on company.CompanyID equals search.ValueID
                where search.SearchID == savedSearchID
                select company;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM