简体   繁体   中英

Algorithm to remove all objects of a tree from a list

I have a problem where I need to remove all objects of a tree from a list.

I have a List<String> Tags which contains the tags in my entire system that match a certain criterion (generally starts with some search string). I also have a root Device object. The Device class is described as follows:

public class Device
{
    public int ID;
    public String Tag;
    public EntityCollection<Device> ChildDevices;
}

The attempt that I have made is to use a breadth first search and remove the tags from the list as each node is visited, then return whatever is leftover:

private List<String> RemoveInvalidTags(Device root, List<String> tags)    
{
    var queue = new Queue<Device>();
    queue.Enqueue(root);

    while (queue.Count > 0)
    {
        var device = queue.Dequeue();
        //load all the child devices of this device from DB
        var childDevices = device.ChildDevices.ToList();

        foreach (var hierarchyItem in childDevices)
            queue.Enqueue(hierarchyItem.ChildDevice);

        tags.Remove(device.Tag);
    }

    return tags;
}

At the moment I am visiting 2000+ device nodes and removing from a list of about 1400 tags (reduced due to the search string). This takes about 4 secs which is far too long.

I have tried changing the list of tags to a hashset but it brings negligible speed improvements.

Any ideas of an algorithm/change that I could use to make this faster?

I'm going to guess that your tree is fairly "fat". That is, that each of your nodes has MANY children, but you don't have a lot of layers. If that is the case, give Depth First Search a try. You should reach bottom quickly and then be able to start removing nodes. You still have to visit all nodes, but you won't have to store as much intermediate data as you would in BFS.

You should definitely be using some sort of hash table (sorry, not familiar with the specifics of c#) for accessing tags.

I am curious about the process of loading the child devices from the DB. Since you are iterating across the entire tree, you might be able to load more appropriately-sized chunks into memory. The breadth-first search might load most of the tree into memory before starting to remove nodes from the queue (if the tree is very wide).

You can use Stopwatch to find out about the bottleneck, If you ask me

var childDevices = device.ChildDevices.ToList();

foreach (var hierarchyItem in childDevices)
   queue.Enqueue(hierarchyItem.ChildDevice);

that s your bottleneck.

Look at this Tree implementation in C# , i hope you already know Tree Traversals .

why dont you try this?

foreach (var hierarchyItem in device.ChildDevices)
   queue.Enqueue(hierarchyItem.ChildDevice);

you dont need to convert device.ChildDevices to list, because it is already enumerable. when you convert that to list, it will be eager, which enumerable, it will be lazy.

Try that.

It would be a good idea to instrument or profile your code to find out where most of the time is going. An earlier comment and answer about "load query to the database" ( ie childDevices = device.ChildDevices.ToList(); ) taking time may be correct, but it seems possible it might instead be
tags.Remove(device.Tag); that is wasting time. A .Remove() is done for every enqueued item. Remove takes O(n) time: "This method performs a linear search; therefore, this method is an O(n) operation, where n is Count." [MSDN]

That is, suppose you enqueue m device items, many of which have .Tag's not in your tags list with n entries. .Remove touches every element of tags when it looks for a .Tag not in the list; and on average it looks at n/2 entries to find a .Tag that is in the list, so total work is O(m*n) . By contrast, work in the method below is O(m + n) , which typically will be hundreds of times smaller.

To sidestep the problem:

  1. Preprocess tags list by making a hash table H corresponding to it
  2. For each device.Tag, test if its hash value is in H
  3. If the value is in H, add device.Tag to a dictionary D
  4. After handling all device.Tag's, for each element T of tags list, if T is in D output T, else suppress T

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM