简体   繁体   English

将列表集合转换为字典以提高性能C#

[英]Convert list collection into Dictionary for performance c#

I have an excel sheet with some rows. 我有一些行的Excel工作表。 Alongside, I have a db table that has the same definition as the excel sheet in terms of column structure. 另外,我还有一个数据库表,它的列结构与excel表具有相同的定义。 What I want to do is check if a record exists in the excel sheet but not in the db table then insert it into a new table (an existing empty table) called TableReport for reporting purposes. 我想做的是检查excel工作表中是否存在记录,但db表中不存在,然后将其插入名为TableReport的新表(现有的空表)中TableReport进行报告。

CSV File: PathToFile.csv CSV档案: PathToFile.csv

Original table to do comparison on: FruitTable 原始表要做比较: FruitTable

New table created for reporting: TableReport 创建用于报告的新表: TableReport

The following code snippets are what you can use to test my scenario. 以下代码段是您可以用来测试我的方案的代码段。 The code below works as I am using a List<T> when PathToFile.csv is relatively quite small- As in like 4 rows. 下面的代码可以作为我使用一个List<T>PathToFile.csv相对比较小如在样4行。 Takes under 30 seconds to complete execution of the program. 不到30秒即可完成程序的执行。 However my real scenario involves PathToFile.csv have about 200 000 rows and thus the list collection is not so efficient in terms of performance. 但是,我的实际情况是PathToFile.csv大约有20万行,因此列表收集在性能方面不是那么高效。 Thus, I have considered using a Dictionary<TKey, TValue> collection but I am stuck as to which parts within the code to tweak because with both collections, I will have to iterate the entire csv to get all the rows and add the to the checkIfFruitsMatch list in this case. 因此,我考虑过使用Dictionary<TKey, TValue>集合,但由于代码中的哪些部分需要调整,我陷入了困境,因为对于这两个集合,我将必须迭代整个csv来获取所有行并将其添加到在这种情况下, checkIfFruitsMatch列表。 Even if I use a dictionary, I would still need to loop and add them before I even perform the comparision and already that is time consuming. 即使使用字典,在进行比较之前,仍然需要循环并添加它们,这已经很耗时。 Performance is a very critical requirement in this case. 在这种情况下,性能是非常关键的要求。

I tried running the program with my current implementation of the list on a csv with 200 000 rows and it took well over 15 minutes, busy looping through the csv and adding the rows to the list and that didn't even finish before I terminated the program. 我尝试在具有20万行的csv上使用列表的当前实现来运行程序,这花费了15分钟以上的时间,忙着循环浏览csv并将行添加到列表中,而在终止终止之前甚至没有完成程序。

How can I achieve this to make the program much more faster. 我该如何实现才能使程序更快。 It shouldn't take longer than 10 minutes to execute. 执行时间不应该超过10分钟。 I have written this in a Windows Forms Application. 我已经在Windows窗体应用程序中编写了此代码。

SQL Table Definition: SQL表定义:

CREATE TABLE [dbo].[FruitTable](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Apples] [nvarchar](20) NOT NULL,
    [Oranges] [nvarchar](20) NOT NULL,
    [Pears] [nvarchar](20) NOT NULL,
    [Bananas] [nvarchar](20) NOT NULL,
    [DateAddedUtc] [nvarchar](50) NULL
) ON [PRIMARY]

GO

Stored Procedure Definition: 存储过程定义:

CREATE PROC [dbo].[spAddFruitsToDB]
@Apples [nvarchar](20),
@Oranges [nvarchar](20),
@Pears [nvarchar](20),
@Bananas [nvarchar](20),
@DateAddedUtc [nvarchar](50)
AS
BEGIN
    INSERT INTO TableReport --Has the same definition as FruitTable
    VALUES (@Apples, @Oranges, @Pears, @Bananas, @DateAddedUtc)
END

Code: 码:

    public class FruitClass {

            private SqlConnection mySQLConnection;
            private SqlCommand mySQLCommand;
            private SqlDataReader mySQLDataReader;
            private string myConnectionString;

        private void CheckDataValidity()
        {   
            Microsoft.Office.Interop.Excel.Application Excel_app = new Microsoft.Office.Interop.Excel.Application();
            Microsoft.Office.Interop.Excel.Workbooks work_books = Excel_app.Workbooks;

            Microsoft.Office.Interop.Excel.Workbook work_book = work_books.Open("C:\\PathToFile.csv");

            Microsoft.Office.Interop.Excel.Sheets work_sheets = work_book.Worksheets;
            Microsoft.Office.Interop.Excel.Worksheet work_sheet = (Microsoft.Office.Interop.Excel.Worksheet)work_sheets.get_Item(1);

            List<FruitClass> checkIfFruitsMatch = new List<FruitClass>();
            List<FruitClass> dbFruitsToMatch= new List<FruitClass>();
            string fruitTagNumberForApples = "";

            for (int j = 2; j < work_sheet.Rows.Count; j++)
            {
                FruitClass fruitInstance = new FruitClass();
                fruitInstance.Apples = CellToString(work_sheet.Cells[j, 3]).Trim();
                fruitInstance.Oranges = CellToString(work_sheet.Cells[j, 13]).Trim();
                fruitInstance.Pears = CellToString(work_sheet.Cells[j, 14]).Trim();
                fruitInstance.Bananas = CellToString(work_sheet.Cells[j, 15]).Trim();

                fruitTagNumberForApples = fruitInstance.Apples;

                checkIfFruitsMatch.Add(fruitInstance);

                if (fruitTagNumberForApples == null || fruitTagNumberForApples == "" || fruitTagNumberForApples == string.Empty)
                break;
            }

            //Get fruits in excel and do a comparison with fruits in database table Fruit
            dbFruitsToMatch.Add(ReturnFruitRow());

            IEnumerable<FruitClass> listComparer = checkIfFruitsMatch.Except(dbFruitsToMatch);
            foreach (FruitClass i in listComparer)
            {
                using (var db = new DBEntities())
                {
                    int countDBexisting = db.FruitTable.Where(x => x.Apples == i.Apples).Count();
                    if (countDBexisting > 0) 
                    {
                        //Fruit has been previously logged. No need to insert a duplicate
                    }
                    else
                    {
                        LogFruitToDB(i, "spAddFruitsToDB"); //Insert records into a new table called "TableReport"
                    }
                }
            }

            work_book.Close();
            Excel_app.Quit();

        }


        private void LogFruitToDB(FruitClass fruitInstance, string cmdText)
            {
                myConnectionString = ConfigurationManager.ConnectionStrings["ConnectionString"].ConnectionString;
                using (mySQLConnection = new SqlConnection(myConnectionString))
                {
                    mySQLCommand = new SqlCommand(cmdText, mySQLConnection);
                    mySQLCommand.CommandType = CommandType.StoredProcedure;

                    SqlParameter Apples_Parameter = new SqlParameter
                    {
                        ParameterName = "@Apples",
                        Value = fruitInstance.Apples
                    };
                    mySQLCommand.Parameters.Add(Apples_Parameter);

                    SqlParameter Oranges_Parameter = new SqlParameter
                    {
                        ParameterName = "@Oranges",
                        Value = fruitInstance.Oranges
                    };
                    mySQLCommand.Parameters.Add(Oranges_Parameter);

                    SqlParameter Pears_Parameter = new SqlParameter
                    {
                        ParameterName = "@Pears",
                        Value = fruitInstance.Pears
                    };
                    mySQLCommand.Parameters.Add(Pears_Parameter);

                    SqlParameter Bananas_Parameter = new SqlParameter
                    {
                        ParameterName = "@Bananas",
                        Value = fruitInstance.Bananas
                    };
                    mySQLCommand.Parameters.Add(Bananas_Parameter);

                    SqlParameter DateAddedUtc_Parameter = new SqlParameter
                    {
                        ParameterName = "@DateAddedUtc",
                        Value = DateTime.UtcNow.ToString()
                    };
                    mySQLCommand.Parameters.Add(DateAddedUtc_Parameter);

                    mySQLConnection.Open();
                    mySQLCommand.ExecuteNonQuery();
                }
            }

private FruitClass ReturnFruitRow()
{
    FruitClass fruitInfo = new FruitClass();
    myConnectionString = ConfigurationManager.ConnectionStrings["ConnectionString"].ConnectionString;
    using (mySQLConnection = new SqlConnection(myConnectionString))
    {
        string procedureName = "select * from dbo.FruitTable";
        mySQLCommand = new SqlCommand(procedureName, mySQLConnection);
        mySQLCommand.CommandType = CommandType.Text;
        mySQLCommand.Connection = mySQLConnection;
        mySQLCommand.Connection.Open();
        mySQLDataReader = mySQLCommand.ExecuteReader();
        if (mySQLDataReader.HasRows)
        {
            while (mySQLDataReader.Read())
            {
                fruitInfo.Apples = mySQLDataReader.GetString(1);
                fruitInfo.Oranges = mySQLDataReader.GetString(2);
                fruitInfo.Pears = mySQLDataReader.GetString(3);
                fruitInfo.Bananas = mySQLDataReader.GetInt32(4).ToString();
            }
        }
        mySQLCommand.Connection.Close();
    }
    return fruitInfo;
}

        private string CellToString(object p)
        {
            try
            {
                return ((Microsoft.Office.Interop.Excel.Range)p).Value.ToString();
            }
            catch
            {
                return "";
            }
        }
    }

    public class FruitClass
    {
        public string Apples;
        public string Oranges;
        public string Pears;
        public string Bananas;
    }

Note: The csv file came in as a normal .xlsx excel file with columns then got saved as a .csv . 注意: csv文件作为带有列的普通.xlsx excel文件进​​入,然后保存为.csv

Test.csv: Shown below Test.csv:如下所示 在此处输入图片说明

So, say FruitTable had a matching record: 因此,说FruitTable具有匹配的记录: 在此处输入图片说明

Then, Table Report should look like below when the program is finished: 然后,程序完成后, Table Report应如下所示: 在此处输入图片说明

But now real scenario has about 200 000 records. 但是,现在真实的场景大约有20万条记录。 Also worth mentioning that this application is run once a month. 还值得一提的是,该应用程序每月运行一次。

I think the problem is that you are reading the .csv via interop. 我认为问题在于您正在通过互操作读取.csv。

You will gain a lot of time if you read the csv as a flat file subsequently manipulating its values. 如果您将csv读取为平面文件,随后对其值进行操作,则将获得很多时间。

see Reading CSV file and storing values into an array 请参阅读取CSV文件并将值存储到数组中

Your method will be very expensive in performance. 您的方法的性能将非常昂贵。 I advise you another method: 我建议您另一种方法:

1) Create a temp_report table that has the structure of the rows you want to put in your report table. 1)创建一个temp_report表,该表具有要放入报表表中的行的结构。

2) At the beginning of your program empty this new table 2)在程序开始时,清空此新表

delete from dbo.temp_report

3) Get an empty datable yourdatatable with 3)使用以下方法获取空的数据库

DataTable yourdatatable = new DataTable();
SqlConnection conn = new SqlConnection (connString);
SqlCommand cmd = new SqlCommand ("select * from dbo.temp_report", conn);
conn.Open ();

// create data adapter
SqlDataAdapter da = new SqlDataAdapter (cmd);

// this will be your datatable
da.Fill (yourdatatable);
conn.Close ();

4) Insert your csv lines into the datatable (c#) 4)将您的csv行插入数据表(c#)

for(here your loop on csv file)
{
    row = yourdatatable.NewRow();
    row["id"] = i;
    row["item"] = "item " + i.ToString();
    yourdatatable.Rows.Add(row);
}

5) Send all your lines to the database using the bulk copy method: 5)使用批量复制方法将所有行发送到数据库:

using (SqlConnection destinationConnection = new SqlConnection (connString))
{
   destinationConnection.Open ();

   using (SqlBulkCopy bulkCopy = new SqlBulkCopy (destinationConnection))
   {
       bulkCopy.DestinationTableName = "dbo.temp_report";

       try
       {
           // Write from the source to the destination.
           bulkCopy.WriteToServer (yourdatatable);
       }
       catch (Exception ex)
       {
           Console.WriteLine (ex.Message);
       }

   }
}

6) Insert into the TableReport table the rows that are not in the FruitTable table by making a query that makes the difference (and not by C # code) 6)通过进行区别查询(而不是通过C#代码),将TableTable表中未存在的行插入TableReport表中

7) Note that you can also improve your performance by reading your csv file text with a split on the separator (tab, or, depending on your file) 7)请注意,您还可以通过读取csv文件文本(在分隔符(选项卡或取决于文件)上进行拆分)来提高性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM