简体   繁体   中英

Streaming query with mssql and node, very slow the first time

I am using node js 10.16.0 and the node-mssql module to connect to a DB. Everything works fine and my simple queries work fine.

If I try to stream data from a query, using the node-mssql example , the first time I execute its very slow. It doesnt show a Timeout Error, but takes about a minute or more to complete.

According to the console log, it brings the first 55 rows and then stops for a while. It looks like it takes some time between the "sets" of data, as I divide them, according to my code below . If I execute the same query a second or third time, it takes only a second to complete. The total amount of rows is about 25.000 or more

How can I make my stream queries faster, at least the first time

Here is my code

following the example, the idea is, start streaming, set 1000 rows, pause streaming, process that rows, send them back with websockets, empty all arrays, continue with streaming, until done

let skate= [];
let leather= [];
let waterproof = [];  
let stream_start = new Date();

const request = new sql.Request(pool);
request.stream = true;     
request
.input('id_param', sql.Int, parseInt(id))
.input('start_date_param', sql.VarChar(50), startDate)
.input('stop_date_param', sql.VarChar(50), stopDate)  
.query('SELECT skate, leather , waterproof FROM shoes WHERE id = @id_param AND CAST(startTime AS date) BETWEEN @start_date_param AND @stop_date_param ');

request.on('row', row => {     
  rowc++; console.log(rowc);
  rowsToProcess.push(row); 
  if (rowsToProcess.length >= 1000) {  
    request.pause();
    processRows();
  } 
});

const processRows = () => {
  rowsToProcess.forEach((item, index) => { 
    skate.push(item.skate);  
    leather.push(item.leather );  
    waterproof.push(item.waterproof);  
  });              
  measurementsData.push(
    {title: 'Skate shoes', data: skate}, 
    {title: 'Leather shoes', data: leather}, 
    {title: 'Waterproof shoes', data: waterproof}
  );  
  console.log('another processRows done');  
  //ws.send(JSON.stringify({ message: measurementsData }));
  rowsToProcess = [];
  skate= [];
  leather= [];
  waterproof = [];       
  measurementsData = [];
  request.resume();
}

request.on('done', () => {      
  console.log('rowc , ', rowc);
  console.log('stream start , ', stream_start);
  console.log('stream done , ', new Date());
  processRows(); 
});

I would try to improve the indexing of shoes table. From what I see, 2 possible issues with your query/indexing :

  • You filter by datetime startTime column but there is index only on the id column (according to the comments)
  • You cast datetime to date within the where clause of the query

Indexes

As you're filtering only on date without time part, I'd suggest you to create a new column startDate which is the conversion of startTime to date and create an index on it. And then use this indexed column in the query.

Also, since you select only skate , leather , waterproof columns, including them in the index could give better performances. Read about indexes with included columns .

If you are always selecting data that is greater or older than certain date then you may look into filtered indexes .

Avoid cast in where

Even if in general cast does not cost but when using it within where clause it might keep SQL Server from making efficient use of the indexes. So you should avoid it .

If you create a new column with just the date part and index it as cited above, you don't need to use cast here:

WHERE id = @id_param AND startDate BETWEEN @start_date_param AND @stop_date_param

When a query runs slow the first time but fast in subsequent executions, as someone suggested earlier, its generally due to caching. The performance is quite likely related to the storage device that the database is operating on.

I expect the explain plan does not change between executions.

you should remove the cast on where clause or create a computed index (if possible in your db)

operations in the column always may hurt your query, avoid it if possible

try just set your where parameters

@start_date_param to date yyyy-mm-dd 00:00:00

@stop_date_param to date yyyy-mm-dd 23:59:59

AND startTime BETWEEN @start_date_param AND @stop_date_param

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM