简体   繁体   English

在LINQ to Entities中检查重复项的最快方法是什么?

[英]What is the fastest way to check for duplicates in LINQ to Entities?

I have a table that is storing strings in a SQL Azure table. 我有一个表,该表在SQL Azure表中存储字符串。 The user can upload files of new strings from a web browser, and I am checking for duplicates based on the Entity Framework context. 用户可以从Web浏览器上载新字符串的文件,而我正在根据Entity Framework上下文检查重复项。 My code to add the de-duplicated strings to the context looks like this: 我的将重复数据删除字符串添加到上下文的代码如下所示:

using (StreamReader sr = new StreamReader(theStream))
{
    string line;
    while ((line = sr.ReadLine()) != null)
    {
        if (!context.MyEntity.Any(o => o.String == line))
        {
            theString = new DAL.TheString();
            theString .String = line;
            context.MyEntity.Add(theString );
            totalAdded++;
        }
    }
}

Using MyEntity.Any() is way, way too slow. 使用MyEntity.Any()很慢。 Handling 20,000 strings takes 40 minutes, and some simple orchestration seems to point to the duplicate check. 处理20,000个字符串需要40分钟,而一些简单的业务流程似乎指向重复检查。

My question is: what is the fastest way to do this within EF? 我的问题是:在EF中最快的方法是什么? Is L2E not the best tool for the job here? L2E不是这里工作的最佳工具吗? Should I get rid of EF altogether? 我应该完全摆脱EF吗? Or should I just queue up the files and set up a background worker, 'cause this is ALWAYS going to be slow. 还是我应该将文件排队并设置后台工作人员,因为这总是很慢。

Assuming your database table isn't so large that the strings cannot all fit into memory, you can put them into a HashSet through one query and then query against that in-memory collection: 假设数据库表不是很大,以至于字符串无法全部放入内存中,则可以通过一个查询将它们放入HashSet ,然后针对该内存中集合进行查询:

var lines = new HashSet<string>(context.MyEntity.Select(o => o.Property));
using (StreamReader sr = new StreamReader(theStream))
{
    while (!sr.EndOfStream)
    {
        string line = sr.ReadLine();
        if (lines.Add(line))
        {
            //add line
        }
    }
}

If you don't have enough memory for this to work, then your best bet would likely be to create a new trigger in the database to verify that the property is unique, and that will throw out records that attempt to create duplicates. 如果您没有足够的内存来工作,那么最好的选择是在数据库中创建一个新触发器,以验证该属性是唯一的,并且将抛出试图创建重复项的记录。 Then you can attempt to add all of the lines from your stream and let the DB sort out which ones to keep when it gets them all. 然后,您可以尝试从流中添加所有行,并让DB整理出所有行后保留哪些行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 检查类型的最快方法是什么? - What is the fastest way to check a type? 检查linq语句中某个索引是否为null的最快方法 - fastest way to check to see if a certain index in a linq statement is null 使用 linq 查询语法连接两个列表的最快方法是什么? - What is the fastest way to join two lists using linq query syntax? EF-Linq查询排序的最快方法是什么? - What is the fastest way to sort an EF-to-Linq query? 使用Linq to SQL确定行是否存在的最快方法是什么? - What is the fastest way to determine if a row exists using Linq to SQL? 将LINQ to Entities查询中的每个项目转换为接口的最佳方法是什么? - What is the best way to cast each item in a LINQ to Entities query to an interface? 在 EF6 中使用 LinQ 审核实体读取的最佳方法是什么? - What is the best way to audit reading of entities with LinQ in EF6? 在C#中检查可空布尔是否正确的最快方法是什么? - What is the fastest way to check for nullable bool being true in C#? 使用linq到实体从表中获取重复项 - get duplicates from a table using linq to entities 从 List 中删除单个项目的最快方法是什么<T>没有重复的顺序很重要? - What is the fastest way to remove a single item from List<T> with no duplicates where order matters?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM