简体   繁体   中英

C# Split DataTable by specific DataRow indexes

I tried to find any solution on stackoverflow, but i didn't found it.

I have a DataTable , obtained from excel sheet. This is not clear table, it contain many sub-tables (starts with headers) and other necessary information above of them (which may contain empty rows). For example:

Line1 : Other data...
[empty line]
Line2: Other data...
[empty line]
...................   
ColA   |   ColB   |   Type   |
------------------------------
AAA    |   BBB    |   IN     |
AAA    |   BBB    |   OUT    |
AAA    |   BBB    |   IN     |

Line1 : Other data...
[empty line]
Line2: Other data...
[empty line]
...................    
ColA   |   ColB   |   Type   |
------------------------------
AAA    |   BBB    |   IN     |
AAA    |   BBB    |   OUT    |
AAA    |   BBB    |   OUT    |

I want to split the data table into multiple data tables that begin with many rows of necessary information, then the table itself and ends with empty row. As a result, I have to get DataTable[] data type.

I tried to obtain indexes of the last row of the sections of the data table (if DataRow contain "in" or "out" and next index contain empty row), but i don't know if is a good code and a good solution for further split a data table:

var indexes = dataTable.AsEnumerable()
    .Where(x => x.ItemArray.All(rowCell => rowCell.ToString() == string.Empty))
    .Where(
       x => dataTable.Rows[dataTable.Rows.IndexOf(x) - 1].ItemArray.Any(
             item => Regex.Match(
                   item.ToString(),
                   "^in|out$",
                   RegexOptions.IgnoreCase).Success))
    .Select(x => dataTable.Rows.IndexOf(x)).ToArray();

I have two Where linq conditions for check whether exist empty row after row that contain " in " or " out " words.

How can I split the DataTable by these indexes? I want to find a similar Linq Expression for this purpose. As a result, I have to get the following tables:

Table #1

Line1 : Other data...
[empty line]
Line2: Other data...
[empty line]
...................   
ColA   |   ColB   |   Type   |
------------------------------
AAA    |   BBB    |   IN     |
AAA    |   BBB    |   OUT    |
AAA    |   BBB    |   IN     |

Table #2

Line1 : Other data...
[empty line]
Line2: Other data...
[empty line]
...................    
ColA   |   ColB   |   Type   |
------------------------------
AAA    |   BBB    |   IN     |
AAA    |   BBB    |   OUT    |
AAA    |   BBB    |   OUT    |

I know how to process this data further, but I don't know how to split the data table.

var selectedIndexes = from row in dataTable.AsEnumerable()
                          where (row.Text.Equals("IN") || row.Text.Equals("OUT"))
                                 && dataTable.Rows[dataTable.Rows.IndexOf(row) - 1]
                          select dataTable.Rows.IndexOf(row);

Might do the trick, although this is a non-tested code.

EDIT:

Since your updated question tells me you are dealing with csv files I am strongly suggesting CSVHelper Nuget for you.

The documentation can be found here:

https://joshclose.github.io/CsvHelper/

Finally, I found the solution. I was inspired by the following answer of the question: Split datatable into multiple fixed sized tables

I have changed a bit the indexes identification code, according to the MwBakker user's suggestion.

var indexes = dataTable.AsEnumerable()
      .Where(x => x.ItemArray.All(rowCell => rowCell.ToString() == string.Empty))
      .Where(
         x => dataTable.Rows[dataTable.Rows.IndexOf(x) - 1].ItemArray.Any(
           item => string.Equals(item.ToString(), "In")
                   || string.Equals(item.ToString(), "Out")))
      .Select(x => dataTable.Rows.IndexOf(x)).ToArray();

I created the extension method which split the data table rows by indexes:

public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> enumerable, int[] indexes)
    {
        // Sort array.
        indexes = indexes.OrderBy(x => x).ToArray();
        var itemsReturned = 0;
        var list = enumerable.ToList();
        var count = list.Count;
        short i = 0;

        while (itemsReturned < count)
        {
            int currentChunkSize = i < indexes.Length
                                       ? Math.Min(indexes[i] - itemsReturned, count - itemsReturned)
                                       : count - itemsReturned;
            yield return list.GetRange(itemsReturned, currentChunkSize);
            itemsReturned += currentChunkSize;
            i++;
        }
    }

Next, i made a foreach loop:

var dataTables = dataTable.AsEnumerable().Split(indexes)
    .Select(rows => rows.CopyToDataTable());    

foreach (var table in dataTables)
{
    // ...
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM