简体   繁体   English

在循环中承诺关闭

[英]Promise closure within loop

I am receiving rows of data every second from Kafka. 我每秒都会收到来自Kafka的数据行。 For each batch of data, I am inserting into my database. 对于每一批数据,我都将插入数据库中。

My app keeps reading the last message and id of each batch. 我的应用程序不断读取每个批次的最后messageid The issue here is that the promises are not running in series, but running concurrently after one batch is finished, and they keep reading the same message and id . 这里的问题是,promise不是按顺序运行,而是在一批完成后并发运行,并且它们继续读取相同的messageid I want each promise to have it's own message and id , as defined by the order they came in from the for-loop in the first function. 我希望每个诺言都有其自己的messageid ,由它们从第一个函数的for循环中传入的顺序定义。

I think I need to use closures, however I am not sure how I can apply them here. 我认为我需要使用闭包,但是我不确定如何在这里应用它们。 I don't want to use timers! 我不想使用计时器!

Thanks! 谢谢!

// This is live data, coming in concurrently, forever. Promises from previous batch must be resolved before the next batch is received.
batchOfRows.on('message', function (data) {
    for (var i = 0; i < batchOfRows.rows.length; i++) {
        validate(batchOfRows.rows[i])
            .then(result => console.log(result))
            .catch(error => console.log(error));
    }
});

// For each row received, give it an ID and then insert into the DB
function validate(data) {
    return new Promise((resolve, reject) => {
        message = data;
        id = message.date + message.location
        DB.execute('select * from table1 where id = ?', id) // This is a promise function provided by the database driver (Cassandra)
            .then(result => {
                // Insert into the table at this ID
                insertIntoDB(message, id)
                    .then(result => resolve(result))
                    .catch(error => reject(error));
            })
            .catch(error => {
                reject(error);
            });
    });
}

// Inserting into DB
function insertIntoDB(message, id) {
    return new Promise((resolve, reject) => {
        query = "insert into table2 where id = ? and messageBody = ?";

        DB.execute(query, [id, JSON.Stringify(message)])
            .then(result => resolve("Successfully inserted message ID " + id))
            .catch(error => reject("Error inserting!"));
    });
}

EDIT (danh's solution): 编辑(danh的解决方案):

var kafka = require('kafka-node');
client = new kafka.Client("localhost:2181"), Consumer = kafka.Consumer;
// This is like an event listener.
batchOfRows = new Consumer(
    client, [{
        topic: 'my_topic',
        partition: 0,
        offset: 0
    }], {
        fromOffset: false
    }
);

let results = [];
let promises = Promise.resolve();

function processQueue() {
    queue.forEach(element => {
        promises = promises.then(element.map(processElement)).then(elementResult => {
            // results.push(elementResult); // Don't want result to increase in size! I have put this inside insertDB then I clear it below
            console.log(results.length); // First received batch prints: 0. Second received batch prints 72. Third received batch prints 75
            results = [];  
            queue.shift();
        });
    });
}

batchOfRows.on('message', function (data) {
    console.log(batchOfRows.value.length); // First received batch prints: 72. Second received batch prints 75. Third received batch prints 76
    queue.push(batchOfRows.rows);
    processQueue();
});

function processElement(data) {
    const id = data.date + data.location
    return  DB.execute('select * from table1 where id = ?', id)
              .then(result => insertIntoDB(data, id).then(() => result));
}

function insertIntoDB(message, id) {
    const query = "insert into table2 where id = ? and messageBody = ?";
    return DB.execute(query, [id, JSON.Stringify(message)])
        .then(result => {
            // Pushing the result here
            results.push(result); // Seems like it does not push the results from the first batch from batchOfRows until it receives the second batch
            console.log("Test") // On the first batch prints "Test" 72 times right away
        });
}

EDIT I have modified the processQueue function just slightly by adding a element.map(processUpdate) because the batches received from batchOfRows are actually arrays, and I need to perform that DB query for each item inside that array. 编辑我已经通过添加element.map(processUpdate)稍微修改了processQueue函数,因为从batchOfRows接收的批处理实际上是数组,并且我需要对该数组内的每个项目执行该DB查询。

I have also removed results.push(elementResult) because elementResult is actually undefined for some reason. 我也删除了result.push(elementResult),因为由于某种原因elementResult实际上是未定义的。 I have moved results.push(elementResult) into insertIntoDB and named it as results.push(result). 我已经将results.push(elementResult)移到insertIntoDB中,并将其命名为results.push(result)。 This may be where the error originates (I don't know how to return the result from insertIntoDB back to the calling promise function processQueue). 这可能是错误产生的地方(我不知道如何将结果从insertIntoDB返回给调用诺言函数processQueue)。

If you take a glance at insertIntoDB, if I console.log("test") it will print test the same number of times as there are elements in the batchOfRows array, signifying that it has resolved all promises in that batch. 如果您看一眼insertIntoDB,如果我console.log(“ test”),它将打印测试次数与batchOfRows数组中的元素相同,表明它已经解决了该批次中的所有诺言。 So on the first batch/message, if there are 72 rows, it will print "Test" 72 times. 因此,在第一个批次/消息上,如果有72行,它将打印72次“测试”。 But if I change that console.log("Test") to simply results.push(result), or even results.push("test"), and then print results.length it will still give me 0 until the second batch completes even though I expect the length to be 72. 但是,如果我将console.log(“ Test”)更改为results.push(result),甚至results.push(“ test”),然后打印results.length,它仍然会给我0,直到第二批完成即使我希望长度为72。

It might be helpful to abstract the ideas a little bit, and represnt them explicitly in data (rather than data retained implictly in the promises). 稍微抽象一下这些想法,并在数据中明确表示它们(而不是在诺言中隐式保留的数据)可能会有所帮助。 Start with a queue: 从队列开始:

let queue = [];

Add stuff to the queue with queue.push(element) and get and remove in order of arrival with element = queue.shift() 使用queue.push(element)将内容添加到队列中,并使用element = queue.shift()按到达顺序获取和删除

Our goal is to process whatever's on the queue, in the order, saving the results in order. 我们的目标是按顺序处理队列中的所有内容,按顺序保存结果。 The processing itself is async, and we want to finish one queue item before starting the next, so we need a chain of promises (called promises ) to process the queue: 处理本身是异步的,我们想在开始下一个队列项目之前先完成它,因此我们需要一连串的Promise(称为promises )来处理队列:

let results = [];
let promises = Promise.resolve();

function processQueue() {
    queue.forEach(element => {
        promises = promises.then(processElement(element)).then(elementResult => {
            results.push(elementResult);
            queue.shift();
        });
    });
}

We can convince ourselves that this is right without even thinking about what processElement() does, so long as it returns a promise. 我们可以说服自己这是对的,甚至不考虑processElement()作用,只要它返回一个承诺即可。 (In the OP case, that promise is a promise to deal with an array of "rows"). (在OP情况下,该诺言就是处理一系列“行”的诺言)。 processElement() will do it's thing, and the result (an array of results in the OP case) will get pushed to results . processElement()会执行此操作,并且结果(在OP情况下为结果数组)将被推送到results

Confident that the ordering of operations makes sense, when a new batch arrives, add it to the queue, and then process whatever's on the queue: 确信操作的排序是有意义的,当新批次到达时,将其添加到队列中,然后处理队列中的所有内容:

batchOfRows.on('message', function (data) {
    queue.push(batchOfRows.rows);
    processQueue();
});

We just need to define processElement() . 我们只需要定义processElement() Use @YuryTarabanko's helpful suggestions for that here (and leave his answer marked correct, IMO) 在此使用@YuryTarabanko的有用建议(IMO将其答案标记为正确)。

function processElement(data) {
    const id = data.date + data.location
    return  DB.execute('select * from table1 where id = ?', id)
              .then(result => insertIntoDB(data, id).then(() => result));
}

function insertIntoDB(message, id) {
    const query = "insert into table2 where id = ? and messageBody = ?";
    return DB.execute(query, [id, JSON.Stringify(message)])
}

One nice side-effect of this is that you can measure progress. 这样做的一个好处是您可以衡量进度。 If the inputs are arriving too fast then the expression: 如果输入到达速度太快,则表达式:

queue.length - results.length

... will grow over time. ...会随着时间的流逝而增长。

EDIT Looking at the newer code, I am puzzled by why a query is done for each row (each element in batchOfRows.rows ). 编辑看着更新的代码,我为为什么对每行( batchOfRows.rows每个元素)进行查询感到困惑。 Since the result of that query is ignored, don't do it... 由于该查询的结果被忽略,因此请不要执行此操作...

function processElement(data) {
    const id = data.date + data.location
    // we know everything we need to know to call insert (data and id)
    // just call it and return what it returns :-)
    return insertIntoDB(data, id);
}

I understand now that this will be a long-running task, and it shouldn't accumulate results (even linearly). 我现在知道这将是一个长期运行的任务,它不应该累积结果(甚至线性地)。 The cleaner fix for that is remove every reference to the results array that I suggested. 为此,更清洁的解决方案是删除对我建议的results数组的所有引用。 The minimal version of insert just inserts and returns the result of the insertion... 最小版本的insert只是插入并返回插入结果。

function insertIntoDB(message, id) {
    const query = "insert into table2 where id = ? and messageBody = ?";
    return DB.execute(query, [id, JSON.Stringify(message)]);
}

I think you added some code to log results (a better test that it worked would be to check the database via some outside process, but if you want to log, just remember to pass-through the result value after logging. 我认为您在日志结果中添加了一些代码(一个更好的测试方法是通过某些外部过程来检查数据库,但是如果您要进行日志记录,只需记住在记录后传递结果值即可。

anyPromise.then(result => {
    console.log(result);
    return result;  // IMPORTANT
})

You have various antipatterns in your code. 您的代码中有各种反模式。 First you don't need to manually create a promise likely you never need to call new Promise . 首先,您不需要手动创建承诺,甚至不需要调用new Promise Second, you are breaking promise chains by not returning a nested promise from within onFulfill handler. 其次,通过不从onFulfill处理程序中返回嵌套的诺言来破坏诺言链。 And finally you are polluting global scope when not declaring variables id = message.date + message.location 最后,当您不声明变量id = message.date + message.location时,您正在污染全局范围

// This is live data, coming in concurrently, forever. Promises from previous batch must be resolved before the next batch is received.
let pending = Promise.resolve([]); // previous batch starting w/ resolved promise
batchOfRows.on('message', function (data) {
    // not sure where was batchRows comming from in your code
    const nextBatch = () => Promise.all(
      data.batchOfRows.rows.map(validate)
    );

    // reassign pending to a new promise
    // whatever happend to previous promise we keep running
    pending = pending
      .then(nextBatch)
      .catch(e => console.error(e))
});

// For each row received, give it an ID and then insert into the DB
function validate(data) {
    const id = data.date + data.location
    return  DB.execute('select * from table1 where id = ?', id)
              .then(result => insertIntoDB(data, id).then(() => result));
}

// Inserting into DB
function insertIntoDB(message, id) {
    const query = "insert into table2 where id = ? and messageBody = ?";
    return DB.execute(query, [id, JSON.Stringify(message)])
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM