简体   繁体   English

提高将大型execl文件转换为对象列表的性能

[英]Improve performance in converting large execl file to list of objects

I know this question have been asked multiple times . 我知道这个问题已经被问过多次了。 But I could not find much help from anyone of those. 但是我从这些人中找不到任何帮助。

I don't want to convert the excel into data table but I want it to be converted to a list of objects and sent to server side for processing. 我不想将excel转换为数据表,但希望将其转换为对象列表并发送到服务器端进行处理。

If it has more than 2K rows it should throw an error. 如果行数超过2K,则应引发错误。 Currently what I am doing is something like : 目前我正在做的事情是这样的:

   using (var excel = new ExcelPackage(hpf.InputStream))
    {
        var ws = excel.Workbook.Worksheets["Sheet1"];

        for (int rw = 4; rw <= ws.Dimension.End.Row; rw++)
        {
            if (ws.Cells[rw, 1].Value != null)
            {
                int headerRow = 2;

                GroupMembershipUploadInput gm = new GroupMembershipUploadInput();

                for (int col = ws.Dimension.Start.Column; col <= ws.Dimension.End.Column; col++)
                {
                    var s = ws.Cells[rw, col].Value;

                    if (ws.Cells[headerRow, col].Value.ToString().Equals("Existing Constituent Master Id"))
                    {
                        gm.cnst_mstr_id = (ws.Cells[rw, col].Value ?? (Object)"").ToString();
                    }
                    else if (ws.Cells[headerRow, col].Value.ToString().Equals("Prefix of the constituent(Mr, Mrs etc)"))
                    {
                        gm.cnst_prefix_nm = (ws.Cells[rw, col].Value ?? (Object)"").ToString();
                    }
                    else if (ws.Cells[headerRow, col].Value.ToString().Equals("First Name of the constituent(Mike)"))
                    {
                        gm.cnst_first_nm = (ws.Cells[rw, col].Value ?? (Object)"").ToString();
                    }
                    .....................
                    .....................


                    }
            }

                    iUploadedCnt = iUploadedCnt + 1; //Increase the count by 1
                }

                if (lgl.GroupMembershipUploadInputList.Count < 2003) //Check for the uploaded list count
                {

                       //throw the error

                 }

But this approach seems slow. 但是这种方法似乎很慢。

Conversion of the excel to list seems slow to me. 对我来说,excel到list的转换似乎很慢。 For example , when I upload more than 2k records , the list gets converted first to list and then the count is checked if more than 2003 . 例如, 当我上传超过2k条记录时,列表首先转换为list,然后检查计数是否超过2003。 This process is definitely slower. 此过程肯定较慢。

How can it be achieved in a faster /better way ? 如何以更快/更好的方式实现它?

You do a lot of repeated string processing which is unnecessary. 您进行了很多重复的字符串处理,这是不必要的。 For each row you check the column headers again if they fit some predefined value. 对于每一行,请再次检查列标题是否符合某个预定义值。 (for instance if (ws.Cells[headerRow, col].Value.ToString().Equals("Existing Constituent Master Id")) . (例如, if (ws.Cells[headerRow, col].Value.ToString().Equals("Existing Constituent Master Id"))

You could do this once before you start parsing all rows and create for instance a Dictionary<int, SomeEnum> which maps the column number to a specific enum value. 在开始解析所有行并创建例如Dictionary<int, SomeEnum>之前,您可以执行一次此操作,该Dictionary<int, SomeEnum>将列号映射到特定的枚举值。 When parsing the rows you then can make a quick lookup in the dictionary, which column maps to which property. 解析行时,您可以在字典中快速查找,哪一列映射到哪个属性。

Furthermore, you define a var s = ws.Cells[rw, col].Value; 此外,您定义了var s = ws.Cells[rw, col].Value; but never use it. 但永远不要使用它。 Instead, you read this cell value again, when you assign it to a property of your object. 而是,当您将其分配给对象的属性时,再次读取该单元格值。 You could just make the necessary conversions and checks here, and then use only s ; 您可以在此处进行必要的转换和检查,然后仅使用s

// define this enum somewhere
enum ColumPropEnum {
   cnst_mstr_id,  cnst_prefix_nm, ...
}

//define this prop somewhere
Dictionary<int, ColumnPropEnum> colprops = new Dictionary<int, ColumnPropEnum>();

//do this once before processing all rows
for (int col = ws.Dimension.Start.Column; col <= ws.Dimension.End.Column; col++) {
    if (ws.Cells[headerRow, col].Value.ToString().Equals("Existing Constituent Master Id")) 
        colprops.Add(col, ColumnPropEnum.cnst_mstr_id);
    else if (ws.Cells[headerRow, col].Value.ToString().Equals(" ..."))
        colprops.Add(col, ColumnPropEnum.cnst_prefix_nm);
    ...
}


//now use this dictionary in each row    
for (int rw = 4; rw <= ws.Dimension.End.Row; rw++)
{
....
    for (int col = ws.Dimension.Start.Column; col <= ws.Dimension.End.Column; col++) {

        //the single ? checks, whether the Value is null, if yes it returns null, otherwise it returns ToString(). Then the double ?? checks whether the result if the operation is null, if yes, it assigns "" to s, otherwise the result of ToString(); 
        var s = ws.Cells[rw, col].Value?.ToString() ?? "";
        ColumnPropEnum cp;
        if (colpros.TryGetValue(col, out cp)) {
            switch (cp) {
                case cnst_mstr_id: gm.cnst_mstr_id = s; break;
                case cnst_prefix_nm: gm.cnst_prefix_nm = s; break;
                ...
            }
        }
    }

} }

I'm not sure at which position you add this object to a list or upload it to the server, as this is not part of the code. 我不确定将此对象添加到列表中或将其上传到服务器的位置,因为这不是代码的一部分。 But it could be faster, to first check only the first column of each row if you have the necessary count of non-null values and throw an error if not and do all the other processing only if you didn't throw the error. 但这可能会更快,如果您具有非空值的必要计数,则仅仅检查每行的第一列,如果没有,则抛出错误,并且仅在没有抛出错误时才进行所有其他处理。

int rowcount = 0;
//If you need at minimum 2000 rows, you can stop after you count 2000 valid rows
for (int rw = 4; rw <= ws.Dimension.End.Row && rowcount < 2000; rw++)        
{
    if (ws.Cells[rw, 1].Value != null) rowcount++
}

if (rowcount < 2000) {
    //throw error and return
} 

//else do the list building and uploading

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 提高大型结构列表的二进制序列化性能 - Improve Binary Serialization Performance for large List of structs 将大型对象列表转换为子集对象列表 - Converting a list of large objects to a list of subset objects 解析大型文本文件时如何提高性能-StreamReader + Regex - How to improve performance when parsing large text file - StreamReader + Regex 提高生成列表的性能 - Improve Performance of Generating List 改善清单的效能<T> - Improve performance of List<T> 使用大量对象,需要更好的(分类)性能 - Working with a large list of Objects, need better (sorted) performance 在大型List中查找具有相同属性的对象 - 性能缓慢 - Find objects with same properties in large List - performance slow 如果字符串存在于大型对象列表中,那么比较最快(性能)的方法是什么? - What is the fastest (performance) way to compare if a string is present in a large list of objects? 将大型 JSON 文件反序列化为对象并将其添加到数据库时出现性能问题 - Performance issue with deserializing large JSON file into objects and add them to database 如何提高通过流下载大型天蓝色 blob 文件的性能? - How to improve performance of downloading large size azure blob file over a stream?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM