Measure time performance of ResultSet from SQL queries

Question

I have some issues regarding the understanding of "ResultSet". If I want to measure the performance time it take to execute the query, do I need to iterate through the Resultset --> while(rs.next()), since the actual result set includes already has all the results? Or is it more like a buffer that while iterating through the ResultSet some tuple just get generated?

Statement b = conn.createStatement();
ResultSet rs2 = b.executeQuery("Select o_orderkey, o_orderstatus, o_orderdate, o_orderpriority, o_comment from orders");
while(rs2.next()){
    int okey=rs2.getInt(1);
    String st=rs2.getString(2);
    Date dt=rs2.getDate(3);
    String pr=rs2.getString(4);
    String co=rs2.getString(5);
}
long endTime = System.currentTimeMillis();
System.out.println(i+". DuckDB " + (endTime- startTime) +" ms");

For this example there is a huge difference in performance. When I only measure the time it needs to build the ResultSet without the while loop it's only a fraction of time. That's why I was thinking it could depend on the database, since DuckDB goes vectorized through the database.

My question is now which way is the correct one, when I only want to have the time it takes to answer the query?

Answer 1

I don't know DuckDB, so I can't answer specifically for that database system.

In general, there is no simple answer to this question. Some JDBC drivers will fetch all rows when you execute the query, and only then return the result set, while some other JDBC drivers will only fetch rows when you iterate over the result set. JDBC drivers may batch rows, so multiple calls to next() can be satisfied from a single batch, and only do a roundtrip to the server when the batch is (nearly) empty, or they could make a roundtrip to the database for each call to next() . In theory, it is even possible that each getXXX will make a roundtrip to the database (though that is uncommon, or only applicable for blobs).

In other words, the behaviour varies between database systems and their drivers, and could also depend on whether you're in auto-commit mode or not, using an updatable or scrollable result set, and possibly other factors (configuration of the driver, version of the database system, etc).

In short, given behaviour varies, the only sure way is to measure it across the execute and fetching of all rows.

Answer 2

DuckDB uses a vectorized execution engine, which allows for streamed query processing. If you don't have a fully materialized query result, that means that every time you do the next(), you will get the next result batch (ie, you will execute the query plan on the next 1024 elements of your table).

Besides that, there are some conversion costs to generate the java dataset, since you will have to do type conversion.

If you want to do a java benchmark, I would say that fully consuming the batch result is the way to go, as long as you do the same to the other systems you are comparing with :-)

Measure time performance of ResultSet from SQL queries

Question

2 answers

solution1
0 2021-10-10 13:48:25

solution2
0 2021-10-15 13:39:17

Measure time performance of ResultSet from SQL queries

Question

2 answers

solution1 0 2021-10-10 13:48:25

solution2 0 2021-10-15 13:39:17

solution1
0 2021-10-10 13:48:25

solution2
0 2021-10-15 13:39:17