简体   繁体   中英

Batch requests and concurrent processing

I have a service in NodeJS which fetches user details from DB and sends that to another application via http. There can be millions of user records, so processing this 1 by 1 is very slow. I have implemented concurrent processing for this like this:

const userIds = [1,2,3....];
const users$ = from(this.getUsersFromDB(userIds));
const concurrency = 150;

users$.pipe(
    switchMap((users) =>
        from(users).pipe(
            mergeMap((user) => from(this.publishUser(user)), concurrency),
            toArray()
        )
    )
).subscribe(
    (partialResults: any) => {
        // Do something with partial results.
    },
    (err: any) => {
        // Error
    },
    () => {
        // done.
    }
);

This works perfectly fine for thousands of user records, it's processing 150 user records concurrently at a time, pretty faster than publishing users 1 by 1.

But problem occurs when processing millions of user records, getting those from database is pretty slow as result set size also goes to GBs(more memory usage also). I am looking for a solution to get user records from DB in batches, while keep on publishing those records concurrently in parallel.

I thinking of a solution like, maintain a queue(of size N) of user records fetched from DB, whenever queue size is less than N, fetch next N results from DB and add to this queue. Then the current solution which I have, will keep on getting records from this queue and keep on processing those concurrently with defined concurrency. But I am not quite able to put this in code. Is there are way we can do this using RxJS?

I think your solution is the right one, ie using the concurrent parameter of mergeMap .

The point that I do not understand is why you are adding toArray at the end of the pipe.

toArray buffers all the notifications coming from upstream and will emit only when the upstream complete s.

This means that, in your case, the subscribe does not process partial results but processes all of the results you have obtained executing publishUser for all users.

On the contrary, if you remove toArray and leave mergeMap with its concurrent parameter, what you will see is a continuous flow of results into the subscribe due to the concurrency of the process.

This is for what rxjs is concerned. Then you can look at the specific DB you are using to see if it supports batch reads. In which case you can create buffers of user ids with the bufferCount operator and query the db with such buffers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM