简体   繁体   中英

Algorthm to make a tree from list of paths

The task is to make a tree from list of sorted paths. Each node is a filesystem object(file or folder).
Currently I'm using this one (pseudo code):

foreach(string path in pathList)
{
    INode currentNode = rootNode;
    StringCollection pathTokens = path.split(pathSplitter);
    foreach(pathToken in pathTokens)
    {
        if (currentNode.Children.contains(pathToken ))
        {
            currentNode = currentNode.Children.find(pathToken);
        }
        else 
        {
            currentNode  = currentNode.Children.Add(pathToken);
        }
    }
 }

pathSplitter is a \\ for win and / for *nix.
Is there a more efficient way to solve that task?

They key quality of your input data is that the list of paths is sorted . Hence you can work with common prefixes between the current and previous nodes quite efficiently. What you can do is maitain the last trace through the tree data structure from its root the leaf folder node. Then for the current path you just traverse the previous trace (ie process the current path relative to the last path) instead of finding the right position in the tree again and again.

When comparing the last and current path, three cases may happen:

1) Same paths

\path\to\folder\file1.txt
\path\to\folder\file2.txt

The trace remains, node for file2.txt is added.

2) New path is a subpath

\path\to\folder\file1.txt
\path\to\folder\subfolder\file2.txt

Nodes for subfolder and file2.txt are added.

3) New path is different

\path\to\folder\file1.txt
\path\to\another_folder\subfolder\file2.txt

First you need to back-track the trace to represent \\path\\to\\ . Then, nodes for another_path , subfolder and file2.txt are added. (Note that the another_folder\\subfolder\\ portion may be missing completely — I hope it's clear.)

Depending on the overall characteristics and volume of data such algoritm may perform faster. You could play with some formal Big-O estimations, but I think it would be faster to just test it.

The algorithm seems optimal to me; if I am not mistaken, the sorting of paths implies that the nodes will be generated in a depth-first sequence with respect to the tree on which they originate. This means that no unneccessary backtracing in the graph is performed. Furthermore, the algorithm is linear in the number of paths in the input and every path is processin in time linear in its length, so the overall running time is linear in the size of the input. Complexity-wise, this means that the algorithm is optimal since it is impossible to read all paths with lower runtime complexity.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM