简体   繁体   English

更快地搜索大型csv文件中的字符串C#

[英]Faster way to search string in a large csv file C#

I am having 我有

  • a DataTable (columns are AccId and TerrName) which contains more than 2000 rows. 一个DataTable(列是AccId和TerrName),包含超过2000行。
  • a large csv file (columns are AccId and External_ID) containing more than 6 millions records. 包含超过6百万条记录的大型csv文件(列为AccId和External_ID)。

Now, I need to match AccId and have to find its corresponding External_ID from the csv file. 现在,我需要匹配AccId并且必须从csv文件中找到它对应的External_ID。

Currently I am achieving it using below code: 目前我正在使用以下代码实现它:

DataTable tblATL = Util.GetTable("ATL", false);
tblATL.Columns.Add("External_ID");

DataTable tbl = Util.CsvToTable("TT.csv", true);

foreach (DataRow columnRow in tblATL.Rows)
{
    var query = tbl.Rows.Cast<DataRow>().FirstOrDefault(x => x.Field<string>("AccId") == columnRow["AccId"].ToString());
    if (query != null)
    {
        columnRow["External_ID"] = query.Field<string>("External_ID");
    }
    else
    {
        columnRow["External_ID"] = "New";
    }
}

This code is working well but only problem is a performance issue, its taking very very long time to get the result. 此代码运行良好,但只有问题是性能问题,它需要很长时间才能得到结果。

Please help. 请帮忙。 How can I improve its performance, do you have any other approach? 如何改善其性能,您还有其他方法吗?

I suggest organizing data into a dictionary , say, Dictionary<String, String[]> which has O(1) time complexity, eg 我建议将数据组织到字典中 ,例如, Dictionary<String, String[]> ,其具有O(1)时间复杂度,例如

  Dictionary<String, String[]> Externals = File
    .ReadLines(@"C:\MyFile.csv")
    .Select(line => line.Split(',')) // the simplest, just to show the idea
    .ToDictionary(
      items => items[0], // let External_ID be the 1st column
      items => items // or whatever record representation
    );

  ....

  String externalId = ...

  String[] items = Externals[externalId];

EDIT : if same External_ID can appear more than once (see comments below) you have to deal with duplicates, eg 编辑 :如果相同的External_ID可以出现多次 (见下面的评论),你必须处理重复,例如

 var csv =  File
   .ReadLines(@"C:\MyFile.csv")
   .Select(line => line.Split(',')) // the simplest, just to show the idea

 Dictionary<String, String[]> Externals = new Dictionary<String, String[]>();

 foreach (var items in csv) {
   var key = items[0]; // let External_ID be the 1st column
   var value = items;  // or whatever record representation

   if (!Externals.ContainsKey(key)) 
     Externals.Add(key, value);
   // else {
   //   //TODO: implement, if you want to deal with duplicates in some other way 
   //}
 }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM