简体   繁体   中英

How do I remove rows from a DataTable up to a certain date?

I am having trouble filtering a DataTable, say DtFromExcel . The DataTable does NOT have a header row, and it starts with an actual data row, and it looks something like the following.

1 | 05/01/2020 Fri | ABC | XYZ | ...
2 | 05/01/2020 Fri | AAA | WKV | ...
3 | 05/02/2020 Sat | BCD | OPQ | ...
4 | 05/03/2020 Sun | CDE | RST | ...
5 | 05/03/2020 Sun | EFA | FAY | ...
6 | 05/03/2020 Sun | AXG | EAS | ...
7 | 05/04/2020 Mon | DEF | LMN | ...
8 | 05/04/2020 Mon | SXA | YTR | ...
9 | 05/05/2020 Tue | DAF | AAG | ...

The second column contains a certain date with some extra string (day of the week), and these rows are ordered by this date column. There can be multiple rows with the same date.

Now, I want to delete rows where the date column contains a certain date AND any rows prior to that. For example, if the certain date is 05/04/2020 , then I need to delete all rows up to the 8th row, so that the remaining DataTable would have to look like

9 | 05/05/2020 Tue | DAF | AAG | ...

My problem is, first I don't know how to filter the DataTable without the column name. I thought about assigning a header row without overwriting the first actual data row, but it seems like this is a lot of work, only to filter. Second, I am not sure how to use these conditions ((a)the second column contains a certain date, AND (b)any row with the dates prior to that certain date).

private void DeleteRows(DateTime certainDate){
    DataRow[] targetRowsToDelete = dtFromExcel.Select(/* Not sure what to put in here */);
    foreach (DataRow row in targetRowsToDelete)
    {
        if (Convert.ToDateTime(row[1].ToString().Split(c" ")[0]) <= certainDate)
        DtFromExcel.Rows.Remove(row);
    }
}

I did not want to loop through the whole DataTable because this process occurs often in my program.

If you use the empty constructor to create a DataColumn with no name, the documentation states...

When created, a DataColumn object has no default ColumnName or Caption . When you add it to a DataColumnCollection , a default name ( "Column1" , "Column2" , and so on) will be generated if a name has not been assigned to the ColumnName .

...so creating and loading a DataTable like this...

const string Input = @"1 | 05/01/2020 Fri | ABC | XYZ | ...
2 | 05/01/2020 Fri | AAA | WKV | ...
3 | 05/02/2020 Sat | BCD | OPQ | ...
4 | 05/03/2020 Sun | CDE | RST | ...
5 | 05/03/2020 Sun | EFA | FAY | ...
6 | 05/03/2020 Sun | AXG | EAS | ...
7 | 05/04/2020 Mon | DEF | LMN | ...
8 | 05/04/2020 Mon | SXA | YTR | ...
9 | 05/05/2020 Tue | DAF | AAG | ...";
DtFromExcel = new DataTable();

for (int i = 0; i < 5; i++)
{
    DataColumn column = new DataColumn();
    Console.WriteLine($"Column {i} has ColumnName \"{column.ColumnName}\"");

    DtFromExcel.Columns.Add(column);
    Console.WriteLine($"Column {i} has ColumnName \"{column.ColumnName}\"");
}

foreach (string line in Input.Split("\r\n"))
{
    string[] fields = line.Split(" | ");

    DtFromExcel.Rows.Add(fields);
}

...produces this output...

Column 0 has ColumnName ""
Column 0 has ColumnName "Column1"
Column 1 has ColumnName ""
Column 1 has ColumnName "Column2"
Column 2 has ColumnName ""
Column 2 has ColumnName "Column3"
Column 3 has ColumnName ""
Column 3 has ColumnName "Column4"
Column 4 has ColumnName ""
Column 4 has ColumnName "Column5"

...so you could always use those default names. Further, just because your input data doesn't specify column/field names doesn't mean you can't do so after it's been loaded into the DataTable ...

DtFromExcel.Columns[1].ColumnName = "MyDateColumn";

Either way, you'll have a known name by which you can refer to that column.

As to your comment about not wanting to "loop through the whole DataTable ", it's not clear if you mean because of the additional code or perhaps performance implications, but to the latter point even if you don't explicitly loop through and test every DataRow , Select() will . On that note, since you say the rows are ordered by date, you can exploit that using LINQ to stop scanning rows as soon as a date outside the search range is found...

private static DateTime GetRowDate(DataRow row) => DateTime.ParseExact(
    (string) row["MyDateColumn"], "MM/dd/yyyy ddd", null
);

private void DeleteRows(DateTime maxDate)
{
    DataRow[] rowsToRemove = DtFromExcel.AsEnumerable()
        .TakeWhile(row => GetRowDate(row) <= maxDate)
        .ToArray();// Required to prevent "Collection was modified" exception in foreach below

    foreach (DataRow row in rowsToRemove)
        DtFromExcel.Rows.Remove(row);
}

If your rows aren't guaranteed to be sorted by date, then you can substitute Where() for TakeWhile() and it will work just the same.

As for your original request to use DateTable.Select() , I'm not sure if that's even feasible here since your dates appear to be stored as string , not DateTime , in your DataColumn . I see that the expression syntax supports a CONVERT() function that can convert between String and DateTime , but I can't imagine that would be any more performant or readable than LINQ so I wouldn't pursue that unless you absolutely have to.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM