Thinking in JavaScript promises (Bluebird in this case)

Question

I'm trying to get my head around some not quite so trivial promise/asynchronous use-cases. In an example I'm wrestling with at the moment, I have an array of books returned from a knex query (thenable array) I wish to insert into a database:

books.map(function(book) {

  // Insert into DB

});

Each book item looks like:

var book = {
    title: 'Book title',
    author: 'Author name'
};

However, before I insert each book, I need to retrieve the author's ID from a separate table since this data is normalised. The author may or may not exist, so I need to:

Check if the author is present in the DB
If it is, use this ID
Otherwise, insert the author and use the new ID

However, the above operations are also all asynchronous.

I can just use a promise within the original map (fetch and/or insert ID) as a prerequisite of the insert operation. But the problem here is that, because everything's ran asynchronously, the code may well insert duplicate authors because the initial check-if-author-exists is decoupled from the insert-a-new-author block.

I can think of a few ways to achieve the above but they all involve splitting up the promise chain and generally seem a bit messy. This seems like the kind of problem that must arise quite commonly. I'm sure I'm missing something fundamental here!

Any tips?

Answer 1

Let's assume that you can process each book in parallel. Then everything is quite simple (using only ES6 API):

Promise
  .all(books.map(book => {
    return getAuthor(book.author)
          .catch(createAuthor.bind(null, book.author));
          .then(author => Object.assign(book, { author: author.id }))
          .then(saveBook);
  }))
  .then(() => console.log('All done'))

The problem is that there is a race condition between getting author and creating new author. Consider the following order of events:

we try to get author A for book B;
getting author A fails;
we request creating author A, but it is not created yet;
we try to get author A for book C;
getting author A fails;
we request creating author A (again!);
first request completes;
second request completes;

Now we have two instances of A in author table. This is bad! To solve this problem we can use traditional approach: locking. We need keep a table of per author locks. When we send creation request we lock the appropriate lock. After request completes we unlock it. All other operations involving the same author need to acquire the lock first before doing anything.

This seems hard, but can be simplified a lot in our case, since we can use our request promises instead of locks:

const authorPromises = {};

function getAuthor(authorName) {

  if (authorPromises[authorName]) {
    return authorPromises[authorName];
  }

  const promise = getAuthorFromDatabase(authorName)
    .catch(createAuthor.bind(null, authorName))
    .then(author => {
      delete authorPromises[authorName];
      return author;
    });

  authorPromises[author] = promise;

  return promise;
}

Promise
  .all(books.map(book => {
    return getAuthor(book.author)
          .then(author => Object.assign(book, { author: author.id }))
          .then(saveBook);
  }))
  .then(() => console.log('All done'))

That's it! Now if a request for author is inflight the same promise will be returned.

Answer 2

Here is how I would implement it. I think some important requirements are:

No duplicate authors are ever created (this should be a constraint in the database itself too).
If the server does not reply in the middle - no inconsistent data is inserted.
Possibility to enter multiple authors.
Don't make n queries to the database for n things - avoiding the classic "n+1" problem.

I'd use a transaction, to make sure that updates are atomic - that is if the operation is run and the client dies in the middle - no authors are created without books. It's also important that a temportary failure does not cause a memory leak (like in the answer with the authors map that keeps failed promises).

knex.transaction(Promise.coroutine(function*(t) {
    //get books inside the transaction
    var authors = yield books.map(x => x.author);
    // name should be indexed, this is a single query
    var inDb = yield t.select("authors").whereIn("name", authors);
    var notIn = authors.filter(author => !inDb.includes("author"));
    // now, perform a single multi row insert on the transaction
    // I'm assuming PostgreSQL here (return IDs), this is a bit different for SQLite
    var ids = yield t("authors").insert(notIn.map(name => {authorName: name });
    // update books _inside the transaction_ now with the IDs array
})).then(() => console.log("All done!"));

This has the advantage of only making a fixed number of queries and is likely to be safer and perform better. Moreover, your database is not in a consistent state (although you may have to retry the operation for multiple instances).

Thinking in JavaScript promises (Bluebird in this case)

Question

2 answers

solution1
8 ACCPTED 2015-05-12 14:40:10

solution2
3 2015-05-13 06:10:02

Thinking in JavaScript promises (Bluebird in this case)

Question

2 answers

solution1 8 ACCPTED 2015-05-12 14:40:10

solution2 3 2015-05-13 06:10:02

solution1
8 ACCPTED 2015-05-12 14:40:10

solution2
3 2015-05-13 06:10:02