提高将大型execl文件转换为对象列表的性能

Question

I know this question have been asked multiple times . 我知道这个问题已经被问过多次了。 But I could not find much help from anyone of those. 但是我从这些人中找不到任何帮助。

I don't want to convert the excel into data table but I want it to be converted to a list of objects and sent to server side for processing. 我不想将excel转换为数据表，但希望将其转换为对象列表并发送到服务器端进行处理。

If it has more than 2K rows it should throw an error. 如果行数超过2K，则应引发错误。 Currently what I am doing is something like : 目前我正在做的事情是这样的：

   using (var excel = new ExcelPackage(hpf.InputStream))
    {
        var ws = excel.Workbook.Worksheets["Sheet1"];

        for (int rw = 4; rw <= ws.Dimension.End.Row; rw++)
        {
            if (ws.Cells[rw, 1].Value != null)
            {
                int headerRow = 2;

                GroupMembershipUploadInput gm = new GroupMembershipUploadInput();

                for (int col = ws.Dimension.Start.Column; col <= ws.Dimension.End.Column; col++)
                {
                    var s = ws.Cells[rw, col].Value;

                    if (ws.Cells[headerRow, col].Value.ToString().Equals("Existing Constituent Master Id"))
                    {
                        gm.cnst_mstr_id = (ws.Cells[rw, col].Value ?? (Object)"").ToString();
                    }
                    else if (ws.Cells[headerRow, col].Value.ToString().Equals("Prefix of the constituent(Mr, Mrs etc)"))
                    {
                        gm.cnst_prefix_nm = (ws.Cells[rw, col].Value ?? (Object)"").ToString();
                    }
                    else if (ws.Cells[headerRow, col].Value.ToString().Equals("First Name of the constituent(Mike)"))
                    {
                        gm.cnst_first_nm = (ws.Cells[rw, col].Value ?? (Object)"").ToString();
                    }
                    .....................
                    .....................


                    }
            }

                    iUploadedCnt = iUploadedCnt + 1; //Increase the count by 1
                }

                if (lgl.GroupMembershipUploadInputList.Count < 2003) //Check for the uploaded list count
                {

                       //throw the error

                 }

But this approach seems slow. 但是这种方法似乎很慢。

Conversion of the excel to list seems slow to me. 对我来说，excel到list的转换似乎很慢。 For example , when I upload more than 2k records , the list gets converted first to list and then the count is checked if more than 2003 . 例如， 当我上传超过2k条记录时，列表首先转换为list，然后检查计数是否超过2003。 This process is definitely slower. 此过程肯定较慢。

How can it be achieved in a faster /better way ? 如何以更快/更好的方式实现它？

Answer 1

You do a lot of repeated string processing which is unnecessary. 您进行了很多重复的字符串处理，这是不必要的。 For each row you check the column headers again if they fit some predefined value. 对于每一行，请再次检查列标题是否符合某个预定义值。 (for instance if (ws.Cells[headerRow, col].Value.ToString().Equals("Existing Constituent Master Id")) . （例如， if (ws.Cells[headerRow, col].Value.ToString().Equals("Existing Constituent Master Id")) 。

You could do this once before you start parsing all rows and create for instance a Dictionary<int, SomeEnum> which maps the column number to a specific enum value. 在开始解析所有行并创建例如Dictionary<int, SomeEnum>之前，您可以执行一次此操作，该Dictionary<int, SomeEnum>将列号映射到特定的枚举值。 When parsing the rows you then can make a quick lookup in the dictionary, which column maps to which property. 解析行时，您可以在字典中快速查找，哪一列映射到哪个属性。

Furthermore, you define a var s = ws.Cells[rw, col].Value; 此外，您定义了var s = ws.Cells[rw, col].Value; but never use it. 但永远不要使用它。 Instead, you read this cell value again, when you assign it to a property of your object. 而是，当您将其分配给对象的属性时，再次读取该单元格值。 You could just make the necessary conversions and checks here, and then use only s ; 您可以在此处进行必要的转换和检查，然后仅使用s 。

// define this enum somewhere
enum ColumPropEnum {
   cnst_mstr_id,  cnst_prefix_nm, ...
}

//define this prop somewhere
Dictionary<int, ColumnPropEnum> colprops = new Dictionary<int, ColumnPropEnum>();

//do this once before processing all rows
for (int col = ws.Dimension.Start.Column; col <= ws.Dimension.End.Column; col++) {
    if (ws.Cells[headerRow, col].Value.ToString().Equals("Existing Constituent Master Id")) 
        colprops.Add(col, ColumnPropEnum.cnst_mstr_id);
    else if (ws.Cells[headerRow, col].Value.ToString().Equals(" ..."))
        colprops.Add(col, ColumnPropEnum.cnst_prefix_nm);
    ...
}


//now use this dictionary in each row    
for (int rw = 4; rw <= ws.Dimension.End.Row; rw++)
{
....
    for (int col = ws.Dimension.Start.Column; col <= ws.Dimension.End.Column; col++) {

        //the single ? checks, whether the Value is null, if yes it returns null, otherwise it returns ToString(). Then the double ?? checks whether the result if the operation is null, if yes, it assigns "" to s, otherwise the result of ToString(); 
        var s = ws.Cells[rw, col].Value?.ToString() ?? "";
        ColumnPropEnum cp;
        if (colpros.TryGetValue(col, out cp)) {
            switch (cp) {
                case cnst_mstr_id: gm.cnst_mstr_id = s; break;
                case cnst_prefix_nm: gm.cnst_prefix_nm = s; break;
                ...
            }
        }
    }

} }

I'm not sure at which position you add this object to a list or upload it to the server, as this is not part of the code. 我不确定将此对象添加到列表中或将其上传到服务器的位置，因为这不是代码的一部分。 But it could be faster, to first check only the first column of each row if you have the necessary count of non-null values and throw an error if not and do all the other processing only if you didn't throw the error. 但这可能会更快，如果您具有非空值的必要计数，则仅仅检查每行的第一列，如果没有，则抛出错误，并且仅在没有抛出错误时才进行所有其他处理。

int rowcount = 0;
//If you need at minimum 2000 rows, you can stop after you count 2000 valid rows
for (int rw = 4; rw <= ws.Dimension.End.Row && rowcount < 2000; rw++)        
{
    if (ws.Cells[rw, 1].Value != null) rowcount++
}

if (rowcount < 2000) {
    //throw error and return
} 

//else do the list building and uploading

提高将大型execl文件转换为对象列表的性能

问题描述

1 个解决方案

解决方案1
0 2016-07-04 11:00:41

提高将大型execl文件转换为对象列表的性能

问题描述

1 个解决方案

解决方案1 0 2016-07-04 11:00:41

解决方案1
0 2016-07-04 11:00:41