简体   繁体   中英

Convert a list of key/value pairs to datatable

I'm working on a parser. It gets values from source text. It does not know beforehand how many or which values it will get, ie names of variables, their count etc. could vary greatly. Each section of source provides some values only, not a complete list. Those values are currently stored in a list of custom class, similar to KeyValuePair, but written from scratch.

Sample what is retrieved from source:

Section 1:
    KeyA = ValA1
    KeyB = ValB1
    KeyD = ValD1
Section 2:
    KeyC = ValC2
Section 3:
    KeyB = ValB3
    KeyD = ValD3

etc.

Now, I'd like to show this information to user as a DataGrid in form of:

| KeyA  | KeyB  | KeyC  | KeyD  |
+-------+-------+-------+-------+
| ValA1 | ValB1 |       | ValD1 |
|       |       | ValC2 |       |
|       | ValB3 |       | ValD3 |

Currently, I'm iterating through all values found in each section, check if column exists - if not - creating new column. If column exists - adding value to respective row/column. Then attaching resulting DataTable to DataGrid as:

dg.ItemSource=dt.AsDataView();

This works perfectly as intended, yet, that is too slow.

I'd appreciate any thoughts on how I could speed that up. Either initial storing, or convertion to DataTable, or some other way of binding data to achieve the same presentation to user.

C#, WPF, .NET framework 4.5

Update: All loading and processing is done beforehand. Ready data is stored as a tree of processed sections. Each section as one of properties holds a list of key/value pairs. Each section has class to populate given DataTable with it's values.

Ie data on backend looks like:

File1
  + Section 1 on level 1
  |   + Section 1
  |   + Section 2
  + Section 2 on level 1
  + Section 3 on level 1
  |   + Section 1
  |   + Section 2
  |   + Section 3
  |   + Section 4
  + Section 4
File2 ...

Each Section has a method:

public void CollectValues(DataTable target) {...}

Which is called by higher level element with some DataTable (initially - empty and getting filled as it goes).

Each section contains internal variable:

private List<CustomValue> Values;

Which holds all the already found&processed values in CustomValue class. CustomValue ~= KeyValuePair, but with added processing routines.

So what happens is CollectValues is being called from requested level (could be top, could be any other) with empty unprepared DataTable. CollectValues iterates (foreach) through all available values in list on current level and adds them to target DataTable 1 at a time, prior to that checking if DataColumn exists with needed name (target[Value.Key]!=null) - and creating column before attempting to add respective value if needed. In metacode:

public void CollectValues(DataTable target)
{
    DataRow dr = target.Rows.Create();
    foreach(var pair in Values)
    {
        if(target[pair.Key]==null) target.Columns.Add(...);
        dr[pair.Key] = pair.Value;
    }
    foreach(var child in Children)
        child.CollectValues(target);
}

Why this specific part - values is just part of similar routines. Other routines crawl similarly on same data set, retrieving other things (mostly working with lists, no DataTables) - all of them work near instantly. Collecting DataTable though might take a few seconds for 1 source for resulting DataGrid to get populated.

Average amount of Values rarely exceeds 1000 (like, 10 columns by 100 rows). DataTable is attached to DataGrid only after it was fully populated.

Just for info on sizes: Sources - usually 2 to 10 files. Each source text size can range 100Kb - 100 MB. Usual file size is around 1-2 MB. Size of backend data in memory usually is under 100 MB.

And to highlight again. It's only DataTable that worries me. Highlights, Sectioning, source retrieval, filtering etc. - all works within my expectations. So I'm looking first of all - for a way to optimize conversion from list of key/value pairs to DataTable, or for a way to store those values differently initially (after processing) to speed up process.

Hope this gives enough info. Not listing source currently to reduce size.

I'd look for a data structure other than a DataTable to use here. It sounds to me like what you need is a Dictionary<string, Dictionary<int, CustomValue>> . The string is your column name, the int is an ID for the row of data, and CustomValue is the data itself.

public void CollectValues(Dictionary<string, Dictionary<int, CustomValue>> target)
{
    foreach(var pair in Values)
    {
        if(target[pair.Key]==null) target.Add(new Dictionary<int, CustomValue>());
        target[pair.Key].Add(pair.ID, pair.Value);
    }
    foreach(var child in Children)
        child.CollectValues(target);
}

If you don't already have an pair.ID in place, you can just use a counter variable (either static or passed with each call) so that each object has a different ID.


It might make more sense to store the values by row, with the columns that each set of data has, rather than the reverse. That would be a IEnumerable<Dictionary<string, CustomValue>> , with each Dictionary representing one row. You would pull out all the columns with target.Select(x => x.Key).Distinct() .

DataTable is slow. It does a lot of stuff.

If you are all string then I would create a collection

List<String> ColNames;
List<String> ColValues;

List<ColValues> RowsColValues;

Then you need to manually bind the columns to the DataGrid using ColValues[i] syntax.

And for speed use ListView GridView for this.
DataGrid is slow and bulkly compared to Gridview.
But GridView does not edit.

Not making this up.
I do exactly this but a different scenario.
User selects the columns they want to see.

DyamicColumns

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM