Nested stream operations in Highland.js

Question

I have a stream of directories from the readdirp module.

I want to:-

search for a file using a regex (eg README.* ) in each directory
read the first line of that file that does not start with a #
print out each directory and this first non-heading line of the README in the directory.

I am trying to do this using streams and highland.js .

I am stuck trying to process a stream of all files inside each directory.

h = require 'highland'

dirStream = readdirp root: root, depth: 0, entryType: 'directories'

dirStream = h(dirStream)
  .filter (entry) -> entry.stat.isDirectory()
  .map (entry) ->

    # Search all files in the directory for README.
    fileStream = readdirp root: entry.fullPath, depth: 0, entryType: 'files', fileFilter: '!.DS_Store'
    fileStream = h(fileStream).filter (entry) -> /README\..*/.test entry.name
    fileStream.each (file) ->
      readmeStream = fs.createReadStream file
      _(readmeStream)
        .split()
        .takeUntil (line) -> not line.startsWith '#' and line isnt ''
        .last(1)
        .toArray (comment) ->
          # TODO: How do I access `comment` asynchronously to include in the return value of the map?

    return {name: entry.name, comment: comment}

Answer 1

It's best to consider Highland streams as immutable, and operations like filter and map returning new streams that depend on the old stream, rather than modifications of the old stream.

Also, Highland methods are lazy: you should only call each or toArray when you absolutely need the data right now .

The standard way of asynchronously mapping a stream is flatMap . It's like map , but the function you give it should return a stream. The stream you get from flatMap is the concatenation of all the returned streams. Because the new stream depends on all the old streams in order, it can be used to sequence asynchronous process.

I'd modify your example to the following (clarified some variable names):

h = require 'highland'

readmeStream = h(readdirp root: root, depth: 0, entryType: 'directories')
  .filter (dir) -> dir.stat.isDirectory()
  .flatMap (dir) ->
    # Search all files in the directory for README.
    h(readdirp root: dir.fullPath, depth: 0, entryType: 'files', fileFilter: '!.DS_Store')
    .filter (file) -> /README\..*/.test file.name
    .flatMap (file) ->
      h(fs.createReadStream file.name)
        .split()
        .takeUntil (line) -> not line.startsWith '#' and line isnt ''
        .last(1)
        .map (comment) -> {name: file.name, comment}

Let's take a walk though the types in this code. First, note that flatMap has type (in Haskellish notation) Stream a → (a → Stream b) → Stream b , ie it takes a stream containing some things of type a , and a function expecting things of type a and returning streams containing b s, and returns a stream containing b s. It's standard for collection types (such as stream and array) to implement flatMap as concatenating the returned collections.

h(readdirp root: root, depth: 0, entryType: 'directories')

Let's say this has type Stream Directory . The filter doesn't change the type, so the flatMap will be Stream Directory → (Directory → Stream b) → Stream b . We'll see what the function returns:

h(readdirp root: dir.fullPath, depth: 0, entryType: 'files', fileFilter: '!.DS_Store')

Call this a Stream File , so the second flatMap is Stream File → (File → Stream b) → Stream b .

h(fs.createReadStream file.name)

This is a Stream String . split , takeUntil and last don't change that, so what does the map do? map is very similar to flatMap : its type is Stream a → (a → b) → Stream b . In this case a is String and b is an object type {name : String, comment : String} . Then map returns a stream of that object, which is what the overall flatMap function returns. Step up, and b in the second flatMap is the object, so the first flatMap 's function also returns a stream of the object, so the entire stream is a Stream {name : String, comment : String} .

Note that because of Highland's laziness, this doesn't actually start any streaming or processing. You need to use each or toArray to cause a thunk and start the pipeline. In each , the callback will be called with your object. Depending on what you want to do with the comments, it might be best to flatMap some more (if you're writing them to a file for example).

Well, I didn't mean to write an essay. Hope this helps.

Nested stream operations in Highland.js

Question

1 answers

solution1
4 2015-07-09 10:35:36

Nested stream operations in Highland.js

Question

1 answers

solution1 4 2015-07-09 10:35:36

solution1
4 2015-07-09 10:35:36