简体   繁体   English

将键/值对列表转换为datatable

[英]Convert a list of key/value pairs to datatable

I'm working on a parser. 我正在研究解析器。 It gets values from source text. 它从源文本中获取值。 It does not know beforehand how many or which values it will get, ie names of variables, their count etc. could vary greatly. 它事先不知道它将获得多少或哪些值,即变量的名称,它们的数量等可能会有很大差异。 Each section of source provides some values only, not a complete list. 源的每个部分仅提供一些值,而不是完整列表。 Those values are currently stored in a list of custom class, similar to KeyValuePair, but written from scratch. 这些值当前存储在自定义类的列表中,类似于KeyValuePair,但是从头开始编写。

Sample what is retrieved from source: 对从源检索的内容进行抽样:

Section 1:
    KeyA = ValA1
    KeyB = ValB1
    KeyD = ValD1
Section 2:
    KeyC = ValC2
Section 3:
    KeyB = ValB3
    KeyD = ValD3

etc. 等等

Now, I'd like to show this information to user as a DataGrid in form of: 现在,我想以下列形式向用户显示此信息作为DataGrid:

| KeyA  | KeyB  | KeyC  | KeyD  |
+-------+-------+-------+-------+
| ValA1 | ValB1 |       | ValD1 |
|       |       | ValC2 |       |
|       | ValB3 |       | ValD3 |

Currently, I'm iterating through all values found in each section, check if column exists - if not - creating new column. 目前,我正在遍历每个部分中找到的所有值,检查列是否存在 - 如果不存在 - 创建新列。 If column exists - adding value to respective row/column. 如果列存在 - 向相应的行/列添加值。 Then attaching resulting DataTable to DataGrid as: 然后将结果DataTable附加到DataGrid:

dg.ItemSource=dt.AsDataView();

This works perfectly as intended, yet, that is too slow. 这完全按预期工作,但是,这太慢了。

I'd appreciate any thoughts on how I could speed that up. 我很欣赏任何有关如何加快速度的想法。 Either initial storing, or convertion to DataTable, or some other way of binding data to achieve the same presentation to user. 无论是初始存储,还是转换为DataTable,还是以其他方式绑定数据,以实现对用户的相同呈现。

C#, WPF, .NET framework 4.5 C#,WPF,.NET framework 4.5

Update: All loading and processing is done beforehand. 更新:所有加载和处理都是事先完成的。 Ready data is stored as a tree of processed sections. 就绪数据存储为已处理部分的树。 Each section as one of properties holds a list of key/value pairs. 每个部分作为一个属性包含键/值对的列表。 Each section has class to populate given DataTable with it's values. 每个部分都有一个类来为给定的DataTable填充它的值。

Ie data on backend looks like: 即后端数据如下:

File1
  + Section 1 on level 1
  |   + Section 1
  |   + Section 2
  + Section 2 on level 1
  + Section 3 on level 1
  |   + Section 1
  |   + Section 2
  |   + Section 3
  |   + Section 4
  + Section 4
File2 ...

Each Section has a method: 每个部分都有一个方法:

public void CollectValues(DataTable target) {...}

Which is called by higher level element with some DataTable (initially - empty and getting filled as it goes). 更高级别的元素使用一些DataTable调用它(最初是空的并且随着它的进行填充)。

Each section contains internal variable: 每个部分都包含内部变量:

private List<CustomValue> Values;

Which holds all the already found&processed values in CustomValue class. 它包含CustomValue类中所有已找到和已处理的值。 CustomValue ~= KeyValuePair, but with added processing routines. CustomValue~ = KeyValuePair,但添加了处理例程。

So what happens is CollectValues is being called from requested level (could be top, could be any other) with empty unprepared DataTable. 所以会发生什么是CollectValues从请求的级别调用(可能是顶级的,可能是任何其他的),空的未准备好的DataTable。 CollectValues iterates (foreach) through all available values in list on current level and adds them to target DataTable 1 at a time, prior to that checking if DataColumn exists with needed name (target[Value.Key]!=null) - and creating column before attempting to add respective value if needed. CollectValues迭代(foreach)当前级别列表中的所有可用值,并一次将它们添加到目标DataTable 1,然后检查DataColumn是否存在所需名称(target [Value.Key]!= null) - 并创建列在尝试根据需要添加相应值之前。 In metacode: 在元代码中:

public void CollectValues(DataTable target)
{
    DataRow dr = target.Rows.Create();
    foreach(var pair in Values)
    {
        if(target[pair.Key]==null) target.Columns.Add(...);
        dr[pair.Key] = pair.Value;
    }
    foreach(var child in Children)
        child.CollectValues(target);
}

Why this specific part - values is just part of similar routines. 为什么这个特定的部分 - 值只是类似例程的一部分。 Other routines crawl similarly on same data set, retrieving other things (mostly working with lists, no DataTables) - all of them work near instantly. 其他例程在相同的数据集上进行类似的爬行,检索其他内容(主要使用列表,没有DataTables) - 所有这些都在即时工作。 Collecting DataTable though might take a few seconds for 1 source for resulting DataGrid to get populated. 虽然收集DataTable可能需要几秒钟才能生成一个源,以便生成DataGrid。

Average amount of Values rarely exceeds 1000 (like, 10 columns by 100 rows). 平均值的数量很少超过1000(例如,10列乘100行)。 DataTable is attached to DataGrid only after it was fully populated. DataTable仅在完全填充后才附加到DataGrid。

Just for info on sizes: Sources - usually 2 to 10 files. 只是有关尺寸的信息:来源 - 通常是2到10个文件。 Each source text size can range 100Kb - 100 MB. 每个源文本大小可以是100Kb - 100 MB。 Usual file size is around 1-2 MB. 通常的文件大小约为1-2 MB。 Size of backend data in memory usually is under 100 MB. 内存中后端数据的大小通常小于100 MB。

And to highlight again. 并再次强调。 It's only DataTable that worries me. 只有DataTable让我担心。 Highlights, Sectioning, source retrieval, filtering etc. - all works within my expectations. 亮点,分段,源检索,过滤等 - 都符合我的期望。 So I'm looking first of all - for a way to optimize conversion from list of key/value pairs to DataTable, or for a way to store those values differently initially (after processing) to speed up process. 所以我首先要看 - 一种优化从键/值对列表转换到DataTable的方法,或者最初(处理后)以不同方式存储这些值的方法来加速过程。

Hope this gives enough info. 希望这能提供足够的信息。 Not listing source currently to reduce size. 目前没有列出来源以减小尺寸。

I'd look for a data structure other than a DataTable to use here. 我会在这里寻找除DataTable之外的数据结构。 It sounds to me like what you need is a Dictionary<string, Dictionary<int, CustomValue>> . 听起来像你需要的是Dictionary<string, Dictionary<int, CustomValue>> The string is your column name, the int is an ID for the row of data, and CustomValue is the data itself. string是您的列名, int是数据行的ID, CustomValue是数据本身。

public void CollectValues(Dictionary<string, Dictionary<int, CustomValue>> target)
{
    foreach(var pair in Values)
    {
        if(target[pair.Key]==null) target.Add(new Dictionary<int, CustomValue>());
        target[pair.Key].Add(pair.ID, pair.Value);
    }
    foreach(var child in Children)
        child.CollectValues(target);
}

If you don't already have an pair.ID in place, you can just use a counter variable (either static or passed with each call) so that each object has a different ID. 如果您还没有pair.ID ,则可以使用计数器变量( static或随每次调用传递),以便每个对象具有不同的ID。


It might make more sense to store the values by row, with the columns that each set of data has, rather than the reverse. 按行存储值可能更有意义,每个数据集具有的列,而不是相反的列。 That would be a IEnumerable<Dictionary<string, CustomValue>> , with each Dictionary representing one row. 这将是一个IEnumerable<Dictionary<string, CustomValue>> ,每个Dictionary代表一行。 You would pull out all the columns with target.Select(x => x.Key).Distinct() . 你将使用target.Select(x => x.Key).Distinct()拉出所有列。

DataTable is slow. DataTable很慢。 It does a lot of stuff. 它做了很多东西。

If you are all string then I would create a collection 如果你都是字符串,那么我会创建一个集合

List<String> ColNames;
List<String> ColValues;

List<ColValues> RowsColValues;

Then you need to manually bind the columns to the DataGrid using ColValues[i] syntax. 然后,您需要使用ColValues [i]语法手动将列绑定到DataGrid。

And for speed use ListView GridView for this. 并且为了速度使用ListView GridView为此。
DataGrid is slow and bulkly compared to Gridview. 与Gridview相比,DataGrid速度缓慢且大幅度。
But GridView does not edit. 但是GridView不会编辑。

Not making this up. 没有做到这一点。
I do exactly this but a different scenario. 我这样做只是一个不同的场景。
User selects the columns they want to see. 用户选择他们想要查看的列。

DyamicColumns DyamicColumns

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM