简体   繁体   中英

How do I create a nested object based on directory structure and files

OK, this has been hurting my brain (if any) for some time now – yes, recursive functions are hard!

What I'm trying to achieve: Create an object that simulate a directory structure containing sub directories and files where directories becomes the key for an object containing filenames as keys with the corresponding file content as values for those keys (see fig. 2)

If I have a directory structure that looks like this:

Fig 1

LEVEL_1
    LEVEL_2
    |   LEVEL_3_1
    |   |   FILE_3_1_1
    |   |   FILE_3_1_2
    |   LEVEL_3_2
    |   |   FILE_3_2_1
    |   |   FILE_3_2_2
    |   |   LEVEL_4
    |   |   |   FILE_4_1
    |   |   |   FILE_4_2
    |   |   |   ... this could go on forever ...
    |   FILE_2_1
    |   FILE_2_2
    FILE_1_1
    FILE_1_2

I'd like to get an object that looks like this (the object itself represent LEVEL_1):

Fig 2

{
    LEVEL_2 : {
        LEVEL_3_1 : {
            FILE_3_1_1 : "FILE CONTENT",
            FILE_3_1_2 : "FILE CONTENT"
        },
        LEVEL_3_2 : {
            FILE_3_2_1 : "FILE CONTENT",
            FILE_3_2_2 : "FILE CONTENT"
            LEVEL_4 : {
                FILE_4_1 : "FILE CONTENT",
                FILE_4_2 : "FILE CONTENT"
            }
        },
        FILE_1_1 : "FILE CONTENT",
        FILE_2_1 : "FILE CONTENT"
    }
}

So, basically all DIRS become objects, and all containing files become keys on that object, and the file content becomes corresponding values.

I've managed to get this far, but have issues dynamically creating the nested objects based on this recursive function (basically, how do I check if a deeply nested object already exits and add another object to it):

    let views_dir = config.root + '/views/',
        vo = {};

    var walkSync = function( dir, filelist ) {
        var fs = fs || require('fs'),
            files = fs.readdirSync(dir);

        filelist = [];

        files.forEach(function( file ) {
            if ( fs.statSync( dir + file ).isDirectory() ) {

                /**
                 * Create nested object of namespaces in some dynamic fashion
                 * Check for current dir in object and add it as namespace in the right structure in vo (object) …
                 */

                vo[file] = {};

                filelist = walkSync(dir + file + '/', filelist);

                 filelist.forEach(function ( filename ) {
                    vo[file][filename.split('.')[0]] = "FILE CONTENT"; <-- I shouldn't have to be doing this in here since files are handled in the else clause below ... but, I told you, recursion makes my head spin.
                });
            } else {
                filelist.push(file);

                /**
                 * Add file to current namespace if any
                 */
                vo[file.split('.')[0]] = "FILE CONTENT";
            }
        });

        return filelist;
    };

    return walkSync( views_dir );

Now, I'm looking for some sort of way to dynamically add nested 'namespaces' to an object. I've been around creating arrays from dirs, and then trying to concatenate them into dot syntax, and all sort of other weird stuff ... now my brain just hurts and I need some help.

And, I've found hundreds of recursive functions online, that does everything else than what I need ...

To verify any of this works, we first recreate the directory structure in the original question. I'm using unique file contents so we can verify file contents are properly matched with their corresponding keys -

$ mkdir -p level_1/level_2/level_3_1 level_1/level_2/level_3_2/level_4
$ echo "file_1_1 content" > level_1/file_1_1
$ echo "file_1_2 content" > level_1/file_1_2
$ echo "file_3_1_1 content" > level_1/level_2/level_3_1/file_3_1_1
$ echo "file_3_1_2 content" > level_1/level_2/level_3_1/file_3_1_2
$ echo "file_3_2_1 content" > level_1/level_2/level_3_2/file_3_2_1
$ echo "file_3_2_2 content" > level_1/level_2/level_3_2/file_3_2_2
$ echo "file_4_1 content" > level_1/level_2/level_3_2/level_4/file_4_1
$ echo "file_4_2 content" > level_1/level_2/level_3_2/level_4/file_4_2

Now our function, dir2obj which makes an object representation of a file system, starting with a root path -

const { readdir, readFile, stat } =
  require ("fs") .promises

const { join } =
  require ("path")

const dir2obj = async (path = ".") =>
  (await stat (path)) .isFile ()
    ? String (await readFile (path))
    : Promise
        .all
          ( (await readdir (path))
              .map
                ( p => 
                    dir2obj (join (path, p))
                      .then (obj => ({ [p]: obj }))
                )
          )
        .then (results => Object.assign(...results))

// run it
dir2obj ("./level_1")
  .then (console.log, console.error)

If your console is truncating the output object, you can JSON.stringify it to see all keys and values -

// run it
dir2obj ("./level_1")
  .then (obj => JSON.stringify (obj, null, 2))
  .then (console.log, console.error)

Here's the output -

{
  "file_1_1": "file_1_1 content\n",
  "file_1_2": "file_1_2 content\n",
  "level_2": {
    "level_3_1": {
      "file_3_1_1": "file_3_1_1 content\n",
      "file_3_1_2": "file_3_1_2 content\n"
    },
    "level_3_2": {
      "file_3_2_1": "file_3_2_1 content\n",
      "file_3_2_2": "file_3_2_2 content\n",
      "level_4": {
        "file_4_1": "file_4_1 content\n",
        "file_4_2": "file_4_2 content\n"
      }
    }
  }
}

Refactor with generics

The program above can be simplified by extracting out a common function, parallel -

// parallel : ('a array promise, 'a -> 'b promise) -> 'b array promise
const parallel = async (p, f) =>
  Promise .all ((await p) .map (f))

// dir2obj : string -> object
const dir2obj = async (path = ".") =>
  (await stat (path)) .isFile ()
    ? String (await readFile (path))
    : parallel // <-- use generic
        ( readdir (path) // directory contents of path
        , p =>           // for each descendent path as p ...
            dir2obj (join (path, p))
              .then (obj => ({ [p]: obj }))
        )
        .then (results => Object.assign(...results))

Including the root object

Notice the output does not contain the "root" object, { level_1: ... } . If this is desired, we can change the program like so -

const { basename } =
  require ("path")

const dir2obj = async (path = ".") =>
  ( { [basename (path)]: // <-- always wrap in object
      (await stat (path)) .isFile ()
        ? String (await readFile (path))
        : await parallel
            ( readdir (path)
            , p => dir2obj (join (path, p)) // <-- no more wrap
            )
            .then (results => Object.assign(...results))
    }
  )

dir2obj ("./level_4") .then (console.log, console.error)

The root object now contains the original input path -

{
  "level_4": {
    "file_4_1": "file_4_1 content\n",
    "file_4_2": "file_4_2 content\n"
  }
}

This version of the program has a more correct behavior. The result will always be an object, even if the input path is a file -

dir2obj ("./level_1/level_2/level_3_2/level_4/file_4_2")
  .then (obj => JSON.stringify (obj, null, 2))
  .then (console.log, console.error)

Still returns an object -

{
  "file_4_2": "file_4_2 content\n"
}

Rewrite using imperative style without async-await

In a comment you remark on the "unreadable" style above, but I find boilerplate syntax and verbose keywords highly unpalatable. In a style I suspect you'll recognize as more familiar, take notice of all the added chars -

const dir2obj = function (path = ".") {
  return stat(path).then(stat => {
    if (stat.isFile()) {
      return readFile(path).then(String)
    }
    else {
      return readdir(path)
        .then(paths => paths.map(p => dir2obj(join(path, p))))
        .then(Promise.all.bind(Promise))
        .then(results => Object.assign(...results))
    }
  }).then(value => {
    return { [basename(path)]: value }
  })
}

Our variables are more difficult to see because we have words like "function", "return", "if", "else", and "then" interspersed through the entire program. Countless {} are added just so the keywords can even be used. It costs more to write more — let that digest for a moment.

It's slightly better with the parallel abstraction, but not much, imo -

const parallel = function (p, f) {
  return p
    .then(a => a.map(f))
    .then(Promise.all.bind(Promise))
}

const dir2obj = function (path = ".") {
  return stat(path).then(stat => {
    if (stat.isFile()) {
      return readFile(path).then(String)
    }
    else {
      return parallel
        ( readdir(path)
        , p => dir2obj(join(path, p))
        )
        .then(results => Object.assign(...results))
    }
  }).then(value => {
    return { [basename(path)]: value }
  })
}

When we look back at the functional-style program, we see each character printed on the screen as representative of some program semantic. p ? t : f p ? t : f evaluates to t if p is true, otherwise f . We don't need to write if (...) { ... } else { ... } every time. x => a takes x and returns a because that's what arrow functions do, so we don't need function (x) { ... } or "return" every time.

I originally learned C-style languages having {} everywhere was a familiar feeling. Over time, I can look at p ? t : f p ? t : f or x => a and instantly understand exactly what the mean and I've come to appreciate not having all the other words and arcane symbols in my way.

There's an added benefit to writing program's in an expression-based style, though, too. Expressions are so powerful because they can be composed with one another to create more complex expressions. We begin to blur the lines between program and data, where everything is just pieces that can be combined like Lego. Even functions (sub-programs) become ordinary data values that we manipulate and combine, just like any other data.

Imperative programs rely on side-effects and imperative statements cannot be combined with one another. Instead, more variables are created to represent intermediate state, which means even more text on the screen and more cognitive load in the programmer's mind. In imperative style, we're forced to think about programs, functions, statements, and data as different kinds of things, and so there is no uniform way to manipulate and combine them.

Related: async and await are not statements

Still, both variants have the exact same behavior as the functional-style program. Ultimately the program's style is left to you, the programmer. Choose any style that you like best.


Similar problem

To gain more intuition on how to solve this kind of problem, please see this related Q&A

OK, after soooome fiddling around I did manage to get it to work like it should. Thanks to @user633183 for the kick off …

Changed what it returned when a file and a bit of other stuff ... like now I know I can have a fairly complex method in a ternary operator ; ). Just not sure I would write code this way, as I find it way to hard to understand and therefore maintain ... not even thinking about how other devs would feel about it. Well, nevermind. Always good to learn something new. And if other find use for it, here's the final version; which return an object of precompiled Handlebars templates easily accesible through the folder structure of your views, like:

let template = [ global.view ].path.to.view.based.on.dir.structure.using.dot.syntax

In this case I've attached the output to a global view, and from there I can access all templates.

const dir2obj = async ( path = "." ) => ( await stat ( path ) )
.isFile()
? readFile( path )
.then(function ( template ) {
    let tpl = 'Handlebars.template(' + Handlebars.precompile( htmlclean( template.toString() ) ) + ')';
    return eval( tpl );
})
.catch(function ( err ) {
    console.log("Error", err);
})
: Promise.all( ( await readdir( path ) )
    .map( p => 
        dir2obj ( join ( path, p ) )
        .then( ( obj ) => {
            return { [ p.split('.')[0] ] : obj }
        })
    )
)
.then ( function ( results ) {
    return Object.assign(...results);
})

// Use
dir2obj ( dir )
.then( console.log )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM