简体   繁体   中英

How to remove empty cells and rows from file using ExcelDataReader?

I'm using ExcelDataReader to read excel file but the file which being processed has 16k columns and 1200 records each. In between only 52 columns are filled other are empty so I want to remove empty columns with rows.

tried with :

var dataset = reader.AsDataSet(new ExcelDataSetConfiguration() {
    ConfigureDataTable = _ => new ExcelDataTableConfiguration() {
        FilterRow = rowReader => rowReader.Depth != 0
    }
});

help will be appreciated thank you

AFAIK, There is not any simple way to filter empty columns or rows for it my best try is:

// use ExcelDataTableConfiguration.FilterRow to filter empty rows
FilterRow = rowReader =>
{
    var hasData = false;
    for (var i = 0; i < rowReader.FieldCount; i++)
    {
        if (rowReader[i] == null || string.IsNullOrEmpty(rowReader[i].ToString()))
        {
            continue;
        }

        hasData = true;
        break;
    }

    return hasData;
},
// use ExcelDataTableConfiguration.FilterColumn to filter empty columns
FilterColumn = (rowReader, colIndex) =>
{
    var hasData = false;
    rowReader.Reset();

    // this will skip first row as it is name of column
    rowReader.Read();

    while (rowReader.Read())
    {
        if (rowReader[colIndex] == null || 
            string.IsNullOrEmpty(rowReader[colIndex].ToString()))
        {
            continue;
        }

        hasData = true;
        break;
    }

    // below codes do a trick!
    rowReader.Reset();
    rowReader.Read();

    return hasData;
}

Using IExcelDataReader arguments inside FilterRow or FilterColumn is evil!
I think in main reader and each rowReader are referenced to one object! so you should always aware of how to use them, as you can see I add a trick to make reader to be ready after using it. - HTH ;)

With a little inspiration from ShA.t's answer , I was able to make header rows and columns read exactly where they start in the excel document. You only need to filter the empty columns, as long as you know where the rows start. Here's a sample code:

public class TrimmedTableSample 
{
    #region Properties
    int HeaderRowIndex { get; set; }
    #endregion

    #region Methods
    public void Read(string documentPath)
    {
        using (var stream = File.Open(documentPath, FileMode.Open, FileAccess.Read))
        using (var reader = ExcelReaderFactory.CreateReader(stream))
        {
            var dataSet = reader.AsDataSet(new ExcelDataSetConfiguration()
            {
                UseColumnDataType = true,
                ConfigureDataTable = (tableReader) => new ExcelDataTableConfiguration()
                {
                    EmptyColumnNamePrefix = "Column ",
                    UseHeaderRow = true,
                    ReadHeaderRow = (reader) =>
                    {
                        bool empty = true;
                        HeaderRowIndex = 0;

                        while (empty)
                        {
                            for (var i = 0; i < reader.FieldCount && empty; i++)
                                empty = string.IsNullOrWhiteSpace(reader.GetString(i));

                            if (empty)
                            {
                                empty = reader.Read(); // Only continue if more content is available
                                HeaderRowIndex++; // Keep track of the first row position.
                            }
                        }
                    },
                    FilterColumn = (reader, index) =>
                    {
                        bool empty = false;
                        string sheet = reader.Name;

                        // Start reading the table from the beginning
                        reader.Reset();

                        // EDIT: Head over the our current excel sheet
                        while (reader.Name != sheet)
                        if (!reader.NextResult())
                            break;


                        // Head to the first row with content
                        int rowIndex = 0;
                        while (rowIndex < HeaderRowIndex)
                        {
                            reader.Read();
                            rowIndex++;
                        }

                        while (reader.Read())
                        {
                            // Decide if the current column is empty
                            if (reader[index] == null || string.IsNullOrEmpty(reader[index].ToString()))
                                continue;

                            empty = true;
                            break;
                        }

                        // Start over again (This allows the reader to automatically read the rest of the content itself)
                        reader.Reset();

                        // EDIT: Head over the our current excel sheet
                        while (reader.Name != sheet)
                        if (!reader.NextResult())
                            break;

                        reader.Read();

                        // Head over to the first row with content
                        rowIndex = 0;
                        while (rowIndex < HeaderRowIndex)
                        {
                            reader.Read();
                            rowIndex++;
                        }

                        // Return info on whether this column should be ignored or not.
                        return empty;
                    }
                }
            });  
        }
    }
    #endregion
}

Good Luck!

I use this solution that works by calling DataRow.Delete() method:

                    foreach (var row in rows)
                    {
                        var emptyColumnsCount = 0;
                        for (var i = 0; i < columns.Count; i++)
                        {
                            if (!row.IsNull(columns[i]))
                            {
                               // Do some logic to read row.
                            }
                            else
                            {
                                emptColumnsCount++;
                            }
                        }
                        if (emptyColumnsCount == columns.Count)
                            row.Delete();
                        else
                           // Do some logic to use row
                    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM