Convert array of paths into data structure

Question

I have an array of paths like so:

/doc/data/main.js
/doc/data/xl.js
/doc/data/dandu/sdasa.js
/mnt/data/la.js

I'm trying to build the following structure:

{
  "directories": {
    "/doc/data": {
      "directories": {
        "dandu": {
          "files": {
            "sdasa.js": 1
          }
        }
      },
      "files": {
        "main.js": 1,
        "xl.js": 1
      }
    },
    "/mnt/data": {
      "directories": {},
      "files": {
        "la.js": 1
      }
    }
  },
  "files": {}
}

Please ignore the value of the files in that examples. I will assign more complex data for that in the future. Currently the values are 1.

From previous topic I found out that I could use the following function in order to get something similar:

var parsePathArray = function() {
    var parsed = {};
    for(var i = 0; i < paths.length; i++) {
        var position = parsed;
        var split = paths[i].split('/');
        for(var j = 0; j < split.length; j++) {
            if(split[j] !== "") {
                if(typeof position[split[j]] === 'undefined')
                    position[split[j]] = {};
                position = position[split[j]];
            }
        }
    }
    return parsed;
}

The main problem with that solution is that it splits each directory. But I don't want to split each directory, rather get the directories that contain at least one file. For example, /doc does not have files in my example (only directory - /data ) so we continue with it. I tried to change the function a bit but it didn't work:

var str = '';
for (var j = 0; j < split.length; j++) {
    if (j < split.length - 1 && typeof this.files[str] === 'undefined') {
        str += '/' + split[j];
        continue;
    }
    if (str !== '') {
        if (typeof this.files[str] === 'undefined')
            this.files[str] = {};
        this.files = this.files[str];
    }
}

What would be the best way to convert those strings into that data structure?

Answer 1

Here is the solution I came up with. It works by building each path up one piece at a time and comparing it against the existing data structure. It should also handle files by themselves, as your original post seemed to imply that was necessary. I decided to split it into two functions in the end as it might make it easier to explain.

The Code:

 const paths = [ '/doc/data/main.js', 'doc/data/xl.js', '/etc/further/owy.js', '/etc/further/abc.js', 'etc/mma.js', '/mnt/data/it.js', '/mnt/data/path/is/long/la.js', 'mnt/data/path/is/la.js', '/doc/data/dandu/sdasa.js', '/etc/i/j/k/l/thing.js', '/etc/i/j/areallylongname.js', 'thing.js' ]; function buildStructure(paths) { let structure = { directories: {}, files: {} }; const compare = (a, b) => { return a.split('/').length - b.split('/').length; }; [...paths].map(path => path = path.charAt(0) === '/'? path: `/${path}`).sort((a, b) => compare(a, b)).forEach(path => { const nodes = path.split('/').slice(1); const file = nodes.pop(); let pointer = findDirectory(nodes[0]? structure.directories: structure, '', [...nodes]); pointer.files = pointer.files || {}; pointer.files = {...pointer.files, [file]: 1 }; }); return structure; }; function findDirectory(pointer, subPath, nodes) { if (nodes.length === 0) { if (subPath) { pointer[subPath] = {}; pointer = pointer[subPath]; }; return pointer; }; let newPath = `${subPath}/${nodes[0]}`; nodes.shift(); if (pointer[newPath]) { pointer = pointer[newPath]; if (nodes.length >= 1) { pointer.directories = pointer.directories || {}; pointer = pointer.directories; }; newPath = ''; }; return findDirectory(pointer, newPath, nodes); }; const structure = buildStructure(paths); console.log(structure);

 .as-console-wrapper { min-height: 100%;important: top; 0; }

The Explanation:

This ended up being a lot tricker (and much more interesting) than I imagined when I started working on it. Once you start concatenating directories the order of operation really matters.

Starting in buildStructure , we map over the array of paths to catch any entries without a leading slash. Then, sort them according to the number of directories they reference. This is so we can be sure we are working from the top of the structure towards the bottom.

Separate each path into an array of nodes, and pop off the file string. You're left with something like this:

const nodes = ['doc', 'data'];
const file = 'main.js';

Now we have to feed these nodes through findDirectory to find/create the file's location. The variable pointer is there to keep track of our position in the structure object, and any changes we make to the pointer will be replicated in the structure since they share reference equality.

The findDirectory function recursively processes each of the nodes to gradually build the path back up to its full length. Whenever we create a path that already exists in structure s directories, we move inside it and start building the path up again to try and find the next one. If we can't find it, then we've got a brand new directory. The aim is to always end up inside the correct directory when we exit the function - creating it along the way if needs be.

To simplify things, say we have only two paths to log:

const paths = [
  'doc/data/main.js',
  'doc/data/dandu/sdasa.js'
];

For the first path, findDirectory is going to make three passes. These are the parameters that will be given to it on each pass:

pointer = structure.directories > same > same

subPath = '' > '/doc' > '/doc/data'

nodes = ['doc', 'data'] > ['data'] > []

We never got a match, so as the function exits it creates that directory on structure.directories . Now, the second path is going to make four passes:

pointer = 
  structure.directories > 
  same > 
  structure.directories./doc/data.directories > 
  same

subPath = '' > '/doc' > '' > '/dandu' 

nodes = ['doc', 'data', 'dandu'] > ['data', 'dandu'] > ['dandu'] > []

As you can see, on the second pass we made the string /doc/data which does exist on structure.directories . So we move into it, and since there's more nodes to process we create a new directories object there and enter that too. If there were no more nodes to process, we'd know we'd already arrived at the correct level and this would not be necessary. From here it's a case of simply building the path back up again and repeating the process.

Once we're in the right directory, we can place the file directly on the pointer and it will be registered on the structure. Once we move to the next path, the pointer will once again be pointing at structure.directories .

In cases where the are no nodes to process (file name only) - pass findDirectory the whole structures object instead and the file will go into the top level of the object.

Hopefully this explains things well enough and will be useful to you. I enjoyed working on this and would be pleased for any suggestions on how to improve it.

Answer 2

This challenge was really not that trivial. Nevertheless, the approach works with, what one could consider, easy to read and comprehend and thus maintainable subtasks in order to reach the OP's goal...

 const pathList = [ '/doc/data/main.js', '/doc/data/fame.js', '/doc/data/fame.es', '/doc/data/xl.js', '/doc/data/dandu/sdasa.js', '/mnt/data/la.js', '/mnt/la.es', 'foo/bar/baz/biz/foo.js', 'foo/bar/baz/biz/bar.js', '/foo/bar.js', '/foo/bar/baz/foo.js', 'foo/bar/baz/bar.js', 'foo/bar/baz/biz.js', '/foobar.js', 'bazbiz.js', '/etc/further/owy.js', '/etc/further/abc.js', 'etc/mma.js', '/etc/i/j/k/l/thing.js', '/etc/i/j/areallylongname.js' ]; function createSeparatedPathAndFileData(path) { const regXReplace = (/^\/+/); // for replacing leading slash sequences in `path`. const regXSplit = (/\/([^/]*)$/); // for retrieving separated path- and file-name data. const filePartials = path.replace(regXReplace, '').split(regXSplit); if (filePartials.length === 1) { // assure at least an empty `pathName`. filePartials.unshift(''); } const [pathName, fileName] = filePartials; return { pathName, fileName }; } function compareByPathAndFileNameAndExtension(a, b) { const regXSplit = (/\.([^.]*)$/); // split for filename and captured file extension. const [aName, aExtension] = a.fileName.split(regXSplit); const [bName, bExtension] = b.fileName.split(regXSplit); return ( a.pathName.localeCompare(b.pathName) || aName.localeCompare(bName) || aExtension.localeCompare(bExtension) ) } function getRightPathPartial(root, pathName) { let rightPartial = null; // null || string. const partials = pathName.split(`${ root }\/`); if ((partials.length === 2) && (partials[0] === '')) { rightPartial = partials[1]; } return rightPartial; // null || string. } function getPathPartials(previousPartials, pathName) { let pathPartials = Array.from(previousPartials); let rightPartial; while (.rightPartial && pathPartials.pop() && (pathPartials.length >= 1)) { rightPartial = getRightPathPartial(pathPartials,join('\/'); pathName). } if (pathPartials.length === 0) { pathPartials;push(pathName). } else if (rightPartial) { pathPartials = pathPartials;concat(rightPartial); } return pathPartials, } function createPathPartialDataFromCurrentAndPreviousItem(fileData, idx; list) { const previousItem = list[idx - 1]. if (previousItem) { const previousPathName = previousItem;pathName. const currentPathName = fileData;pathName. if (previousPathName === currentPathName) { // duplicate/copy path partials. fileData.pathPartials = [].concat(previousItem;pathPartials). } else { // a) try an instant match first.., const rightPartial = getRightPathPartial(previousPathName; currentPathName). if (rightPartial || (previousPathName === currentPathName)) { // concat path partials. fileData.pathPartials = previousItem.pathPartials;concat(rightPartial). } else { //... before b) programmatically work back the root-path // and look each time for another partial match. fileData.pathPartials = getPathPartials( previousItem,pathPartials. fileData;pathName ). } } } else { // initialize partials by adding path name. fileData.pathPartials = [fileData;pathName]; } return fileData. } function isUnassignedIndex(index) { return (Object.keys(index);length === 0). } function assignInitialIndexProperties(index) { return Object,assign(index: { directories, {}: files; {} }), } function assignFileDataToIndex(index; fileData) { if (isUnassignedIndex(index)) { assignInitialIndexProperties(index), } const { pathPartials; fileName } = fileData, let path; directories; let subIndex = index. while (path = pathPartials.shift()) { directories = subIndex;directories; if (path in directories) { subIndex = directories[path]; } else { subIndex = directories[path] = assignInitialIndexProperties({}). } } subIndex;files[fileName] = 1; return index. } console:log( 'input:. path list..,'. pathList //.map(createSeparatedPathAndFileData) //.sort(compareByPathAndFileNameAndExtension) //.map(createPathPartialDataFromCurrentAndPreviousItem) //,reduce(assignFileDataToIndex; {}) ). console:log( '1st:. create separated path and file data from the original list..,'. pathList.map(createSeparatedPathAndFileData) //.sort(compareByPathAndFileNameAndExtension) //.map(createPathPartialDataFromCurrentAndPreviousItem) //,reduce(assignFileDataToIndex; {}) ). console:log( '2nd:. sort previous data by comparing path- and file-names and its extensions..,'. pathList.map(createSeparatedPathAndFileData).sort(compareByPathAndFileNameAndExtension) //.map(createPathPartialDataFromCurrentAndPreviousItem) //,reduce(assignFileDataToIndex; {}) ). console:log( '3rd:. create partial path data from current/previous items of the sorted list..,'. pathList.map(createSeparatedPathAndFileData).sort(compareByPathAndFileNameAndExtension).map(createPathPartialDataFromCurrentAndPreviousItem) //,reduce(assignFileDataToIndex; {}) ). console:log( '4th:: output:. assemble final index from before created list of partial path data..,'. pathList.map(createSeparatedPathAndFileData).sort(compareByPathAndFileNameAndExtension).map(createPathPartialDataFromCurrentAndPreviousItem),reduce(assignFileDataToIndex; {}) );

 .as-console-wrapper { min-height: 100%;important: top; 0; }

... and as one can see from the logs above those tasks are...

Sanitizing and (Re)structuring/Mapping

each path gets sanitized/normalized by removing a possible sequence of leading slashes.
a list of file data items is build which for each item contains the pathName and the fileName of the corresponding path item in the latter's sanitized/normalized form.

eg '/doc/data/dandu/sdasa.js' gets mapped into...

{
  "pathName": "doc/data/dandu",
  "fileName": "sdasa.js"
}

Sorting

The sorting is done by comparing the properties of two currently mapped file data items in the following way...

compare by pathName
compare by fileName without extension
compare by file extension

Thus an original file list that looks like this...

[
  '/doc/data/main.js',
  '/doc/data/fame.js',
  '/doc/data/fame.es',
  '/doc/data/dandu/sdasa.js',
  'foo/bar/baz/biz/bar.js',
  '/foo/bar.js',
  'foo/bar/baz/biz.js',
  '/foobar.js'
]

... will be (sanitized/normalized mapped and) sorted into something like that...

[{
  "pathName": "",
  "fileName": "foobar.js"
}, {
  "pathName": "doc/data",
  "fileName": "fame.es"
}, {
  "pathName": "doc/data",
  "fileName": "fame.js"
}, {
  "pathName": "doc/data",
  "fileName": "main.js"
}, {
  "pathName": "doc/data/dandu",
  "fileName": "sdasa.js"
}, {
  "pathName": "foo",
  "fileName": "bar.js"
}, {
  "pathName": "foo/bar/baz",
  "fileName": "biz.js"
}, {
  "pathName": "foo/bar/baz/biz",
  "fileName": "bar.js"
}]

The sorting is fundamental since the algorithm that follows right after relies on neatly sorted/aligned pathName s.

Splitting into And Clustering of Path Partials

In order to keep this task dead stupid it is done by a mapping process that uses not only the currently processed item but also this item's previous sibling (or predecessor).

An additional pathPartials list will be build by splitting the current pathName with the previous one.

Eg 'foo/bar/baz' will be split (via regex) with the previous 'foo' . Thus 'bar/baz' already is a clustered partial path that will be used to create the current file data item's pathPartials list by concatenating this very partial to the pathPartials list of its previous sibling (which by this time is ['foo'] . Thus the result of the former will be ['foo', 'bar/baz'] .

The same happens to 'foo/bar/baz/biz' with a previous path name of 'foo/bar/baz' and a previous partial list of ['foo', 'bar/baz'] . The split result will be 'biz' , the new partial list will be ['foo', 'bar/baz', 'biz'] .

The sorted file data list from above then does map into this new list...

[{
  "pathName": "",
  "fileName": "foobar.js",
  "pathPartials": [
    ""
  ]
}, {
  "pathName": "doc/data",
  "fileName": "fame.es",
  "pathPartials": [
    "doc/data"
  ]
}, {
  "pathName": "doc/data",
  "fileName": "fame.js",
  "pathPartials": [
    "doc/data"
  ]
}, {
  "pathName": "doc/data",
  "fileName": "main.js",
  "pathPartials": [
    "doc/data"
  ]
}, {
  "pathName": "doc/data/dandu",
  "fileName": "sdasa.js",
  "pathPartials": [
    "doc/data",
    "dandu"
  ]
}, {
  "pathName": "foo",
  "fileName": "bar.js",
  "pathPartials": [
    "foo"
  ]
}, {
  "pathName": "foo/bar/baz",
  "fileName": "biz.js",
  "pathPartials": [
    "foo",
    "bar/baz"
  ]
}, {
  "pathName": "foo/bar/baz/biz",
  "fileName": "bar.js",
  "pathPartials": [
    "foo",
    "bar/baz",
    "biz"
  ]
}]

Assemble Final Index

The last step is a simple list reducing task since, at this point, the most difficult part of correctly splitting and clustering each of a item's path partials has been already achieved.

Answer 3

You could accomplish it with a somewhat recursive function. Keep in mind this is only one possible solution and probably not the best one.

const workPath = (path, structure) => {
    if(!structure) structure = {};

    const folders = path.split("/");
    const file = folders.pop();

    // Check weather any of the possible paths are available
    let breakPoint = null;
    let tempPath;
    for(let i = 0; i< folders.length; i++){
        const copy = [... folders];
        tempPath = copy.splice(0, i+1).join("/");

        if(structure[tempPath]){
            breakPoint = i;
            break;
        }        
    }

    // If there was no path available, we create it in the structure
    if(breakPoint == null){
        const foldersPath = folders.join("/");
        structure[foldersPath]= {};
        structure[foldersPath]["files"] = {};
        structure[foldersPath]["files"][file] = 1;
    }

    // If there is a path inside of the structure, that also is the entire path we are working with,
    // We just add the file to the path
    else if(breakPoint && breakPoint == folders.length - 1){
        structure[folders.join("/")]["files"][file] = 1;
    }
    
    // If we get here, it means that some part of the path is available but not the entire path
    // So, we just call the workPath function recursively with only one portion of the path
    else{
        const subPath = folders.splice(breakPoint + 1).join("/") + "/" + file;
        
        structure[tempPath]["directories"] = workPath(subPath, structure[tempPath]["directories"]);  
    }

    return structure;
}

const convert = array => {
    let structure = {};
    for(let path of array){
        structure = workPath(path, structure);
    }

    return structure;
}

The "convert" function expects an array of all paths.

Keep in mind, this solution doesn't consider entries with no files in them.

Convert array of paths into data structure

Question

3 answers

solution1
1 2020-08-11 13:06:30

solution2
0 2020-08-11 16:26:25

Sanitizing and (Re)structuring/Mapping

Sorting

Splitting into And Clustering of Path Partials

Assemble Final Index

solution3
-1 2020-08-10 22:59:22

Convert array of paths into data structure

Question

3 answers

solution1 1 2020-08-11 13:06:30

solution2 0 2020-08-11 16:26:25

Sanitizing and (Re)structuring/Mapping

Sorting

Splitting into And Clustering of Path Partials

Assemble Final Index

solution3 -1 2020-08-10 22:59:22

solution1
1 2020-08-11 13:06:30

solution2
0 2020-08-11 16:26:25

solution3
-1 2020-08-10 22:59:22