简体   繁体   English

从列表中删除树的所有对象的算法

[英]Algorithm to remove all objects of a tree from a list

I have a problem where I need to remove all objects of a tree from a list. 我有一个需要从列表中删除树的所有对象的问题。

I have a List<String> Tags which contains the tags in my entire system that match a certain criterion (generally starts with some search string). 我有一个List<String> Tags ,其中包含整个系统中符合特定条件的标记(通常以某些搜索字符串开头)。 I also have a root Device object. 我也有一个根Device对象。 The Device class is described as follows: Device类的描述如下:

public class Device
{
    public int ID;
    public String Tag;
    public EntityCollection<Device> ChildDevices;
}

The attempt that I have made is to use a breadth first search and remove the tags from the list as each node is visited, then return whatever is leftover: 我所做的尝试是使用广度优先搜索,并在访问每个节点时从列表中删除标签,然后返回剩余的内容:

private List<String> RemoveInvalidTags(Device root, List<String> tags)    
{
    var queue = new Queue<Device>();
    queue.Enqueue(root);

    while (queue.Count > 0)
    {
        var device = queue.Dequeue();
        //load all the child devices of this device from DB
        var childDevices = device.ChildDevices.ToList();

        foreach (var hierarchyItem in childDevices)
            queue.Enqueue(hierarchyItem.ChildDevice);

        tags.Remove(device.Tag);
    }

    return tags;
}

At the moment I am visiting 2000+ device nodes and removing from a list of about 1400 tags (reduced due to the search string). 目前,我正在访问2000多个设备节点,并从大约1400个标签的列表中删除(由于搜索字符串而减少)。 This takes about 4 secs which is far too long. 这大约需要4秒钟,太长了。

I have tried changing the list of tags to a hashset but it brings negligible speed improvements. 我曾尝试将标签列表更改为哈希集,但它带来的速度改进可忽略不计。

Any ideas of an algorithm/change that I could use to make this faster? 关于算法/更改的任何想法,我可以用来使其更快?

I'm going to guess that your tree is fairly "fat". 我猜你的树很“胖”。 That is, that each of your nodes has MANY children, but you don't have a lot of layers. 也就是说,您的每个节点都有许多子级,但是您没有很多层。 If that is the case, give Depth First Search a try. 如果是这种情况,请尝试“ 深度优先搜索” You should reach bottom quickly and then be able to start removing nodes. 您应该快速到达最低点,然后能够开始删除节点。 You still have to visit all nodes, but you won't have to store as much intermediate data as you would in BFS. 您仍然必须访问所有节点,但是不必像在BFS中那样存储尽可能多的中间数据。

You should definitely be using some sort of hash table (sorry, not familiar with the specifics of c#) for accessing tags. 您绝对应该使用某种哈希表(对不起,不熟悉c#的细节)来访问标签。

I am curious about the process of loading the child devices from the DB. 我对从数据库加载子设备的过程感到好奇。 Since you are iterating across the entire tree, you might be able to load more appropriately-sized chunks into memory. 由于您要遍历整个树,因此您可能能够将大小更大的块加载到内存中。 The breadth-first search might load most of the tree into memory before starting to remove nodes from the queue (if the tree is very wide). 广度优先搜索可能会在开始从队列中删除节点之前(如果树很宽)将大多数树加载到内存中。

You can use Stopwatch to find out about the bottleneck, If you ask me 您可以使用Stopwatch来了解瓶颈,如果您问我

var childDevices = device.ChildDevices.ToList();

foreach (var hierarchyItem in childDevices)
   queue.Enqueue(hierarchyItem.ChildDevice);

that s your bottleneck. 那就是你的瓶颈。

Look at this Tree implementation in C# , i hope you already know Tree Traversals . 看一下C#中的Tree实现 ,希望您已经了解Tree Traversals

why dont you try this? 你为什么不尝试这个?

foreach (var hierarchyItem in device.ChildDevices)
   queue.Enqueue(hierarchyItem.ChildDevice);

you dont need to convert device.ChildDevices to list, because it is already enumerable. 您不需要将device.ChildDevices转换为list,因为它已经可以枚举。 when you convert that to list, it will be eager, which enumerable, it will be lazy. 当您将其转换为列表时,它会很渴望,这很枚举,会很懒。

Try that. 试试看

It would be a good idea to instrument or profile your code to find out where most of the time is going. 最好对代码进行分析或配置,以找出大部分时间在哪里。 An earlier comment and answer about "load query to the database" ( ie childDevices = device.ChildDevices.ToList(); ) taking time may be correct, but it seems possible it might instead be 关于“向数据库加载查询”( childDevices = device.ChildDevices.ToList(); )花费时间的早期注释和答案可能是正确的,但似乎有可能是
tags.Remove(device.Tag); that is wasting time. 那是浪费时间。 A .Remove() is done for every enqueued item. .Remove()对每个排队的项目完成。 Remove takes O(n) time: "This method performs a linear search; therefore, this method is an O(n) operation, where n is Count." 删除需要O(n)时间:“此方法执行线性搜索;因此,此方法是O(n)运算,其中n为Count。” [MSDN] [MSDN]

That is, suppose you enqueue m device items, many of which have .Tag's not in your tags list with n entries. 也就是说,假设您排队了m设备项,其中许多设备项具有.Tag不在n个条目的tags列表中。 .Remove touches every element of tags when it looks for a .Tag not in the list; 当查找不在列表中的.Tag时,.Remove会触摸tags每个元素; and on average it looks at n/2 entries to find a .Tag that is in the list, so total work is O(m*n) . 平均而言,它会查看n/2个条目以找到列表中的.Tag,因此总工作量为O(m*n) By contrast, work in the method below is O(m + n) , which typically will be hundreds of times smaller. 相比之下,以下方法的工作量为O(m + n) ,通常会小数百倍。

To sidestep the problem: 要回避问题:

  1. Preprocess tags list by making a hash table H corresponding to it 通过制作与之对应的哈希表H来预处理tags列表
  2. For each device.Tag, test if its hash value is in H 对于每个device.Tag,测试其哈希值是否在H中
  3. If the value is in H, add device.Tag to a dictionary D 如果值在H中,则将device.Tag添加到字典D中
  4. After handling all device.Tag's, for each element T of tags list, if T is in D output T, else suppress T 处理完所有device.Tag之后,对于tags列表的每个元素T,如果T在D输出T中,则抑制T

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM