简体   繁体   English

如何通过实体框架优化插入多条记录(使用存在检查)

[英]How to optimise inserting multiple records (with exists check) via Entity Framework

I have a folder filled with about 200 csv files, each containing about 6000 rows of data containing mutual fund data.我有一个文件夹,里面有大约 200 个 csv 文件,每个文件包含大约 6000 行包含共同基金数据的数据。 I have to copy those comma separated data into the database via Entity Framework.我必须通过实体框架将这些逗号分隔的数据复制到数据库中。

The two major objects are Mutual_Fund_Scheme_Details and Mutual_Fund_NAV_Details .两个主要对象是Mutual_Fund_Scheme_DetailsMutual_Fund_NAV_Details

  • Mutual_Fund_Scheme_Details - this contains columns like Scheme_Name, Scheme_Code, Id, Last_Updated_On. Mutual_Fund_Scheme_Details - 这包含 Scheme_Name、Scheme_Code、Id、Last_Updated_On 等列。

  • Mutual_Fund_NAV_Details - this contains Scheme_Id (foreign key), NAV, NAV_Date. Mutual_Fund_NAV_Details - 这包含 Scheme_Id(外键)、NAV、NAV_Date。

Each line in the CSV contains all of the above columns so before inserting, I have to - CSV 中的每一行都包含上述所有列,因此在插入之前,我必须 -

  1. Split each line.分割每一行。
  2. Extract first the scheme related data and check if the scheme exists and get id.先提取scheme相关数据,检查scheme是否存在,获取id。 If it does not exist then insert the scheme details and get id.如果不存在,则插入方案详细信息并获取 ID。
  3. Using the id obtained from step 2, check if an entry for NAV exists for the same date.使用从第 2 步获得的 id,检查是否存在同一日期的 NAV 条目。 If not, then insert it else skip it.如果没有,则插入它,否则跳过它。
  4. If an entry is inserted in Step 3 then the Last_Updated_On date might need to be updated for the scheme with the NAV date (depending on it is newer than existing value)如果在步骤 3 中插入条目,则可能需要使用 NAV 日期更新方案的 Last_Updated_On 日期(取决于它是否比现有值更新)

All the exists checks are done using ANY linq extension method and all the new entries are inserted into the DbContext but the SaveChanges method is called only at the end of processing of each file.所有存在检查都是使用 ANY linq 扩展方法完成的,所有新条目都插入到DbContext ,但SaveChanges方法仅在每个文件处理结束时调用。 I used to call it after each insert but that just takes even longer than right now.我曾经在每次插入后调用它,但这只需要比现在更长的时间。

Now since, this involves at least two exists checks, at the most two inserts and one update, the insertion of each file is taking too long close to 5-7 minutes per file.现在,由于这涉及至少两个存在检查,最多两个插入和一个更新,每个文件的插入花费的时间太长,每个文件接近 5-7 分钟。 I am looking for suggestions to improve this.我正在寻找改进这一点的建议。 Any help would be useful.任何帮助都会很有用。

Specifically, I am looking to:具体来说,我希望:

  1. Reduce the time it takes to process each file减少处理每个文件所需的时间
  2. Decrease the number of individual exists check (if I can possibly club them in some way)减少个人存在检查的数量(如果我可以以某种方式将它们加入俱乐部)
  3. Decrease individual inserts/updates (if I can possibly club them in some way)减少个别插入/更新(如果我可以以某种方式将它们加入俱乐部)

It's going to be hard to optimize it with EF.使用 EF 将很难对其进行优化。 Here is a suggestion:这是一个建议:

  1. Once you process the whole file (~6000) do the exists check with .Where( x => listOfIdsFromFile.Contains(x.Id)) .处理整个文件(~6000)后,使用.Where( x => listOfIdsFromFile.Contains(x.Id))进行存在检查。 This should work for 6000 ids and it will allow you separate inserts from updates.这应该适用于 6000 个 id,它将允许您将插入与更新分开。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM