简体   繁体   中英

Does number of columns in a table affect the performance of a count(*) query on a table with no indexes?

I'm creating benchmark tables to measure hourly load (query latency, query duration) on our Netezza box. I have a fair idea of the number of rows I want in the tables, and I'm trying to decide on the number of columns. There will be no indexes; I'll be running full table scans.

I'm trying to decide how many columns I need in the benchmark tables and had the following question: how would the number of columns (and their type) affect the performance of a count(*) query. My initial thought is that a table with more columns will be spread over more disk blocks. Consequently the system will have to make more disk seeks leading to a longer query.

While I'm working on Netezza, I welcome answers relevant to other systems as well (MySql, Postgres, Vertica etc) to help my general understanding.

There have been several discussions already ( Q1 , Q2 , Q3 ) on the impact of column count on query performance. The questions talk about a generic query and not a full table scan without index. Hence the separate question.

Yes the number of columns will - indirectly - influence the performance. The data in the columns will also affect the speed.

Why is that?

Every DBMS stores rows in blocks - typically 8k blocks, but not necessarily. Especially data warehouse systems tend to use larger block sizes. If a table has many columns with a lot of data in them (think varchar columns) this means that on less rows fit on a single database block.

For a transactional system that supports proper isolation a count(*) query must query all rows in a table (and check for each row if it's currently visible to your transaction). The smallest unit a DBMS reads from the harddisk is a block. So the more rows fit onto a block, the less I/O needs to be done.

If a rows takes 100 bytes in average a single block will contain approximately 80 rows. To count through all rows in a table with 80 rows, the database only needs to do a single I/O operation (in reality it's a bit more to look up the table itself).

Now if each row needs 1000 bytes, a single block contains approximately 8 rows, which in turn means to count all rows the DB needs to do 8 I/O operations.

Even if the data is cached, it's still 1 "logical" I/O vs. 8 "logical" I/O operations.

The above only holds true if no indexes are involved.

This effect can be observed with any DBMS that supports a detailed execution plan. Take this sample script for Postgres:

create table data5 (c1 text, c2 text, c3 text, c4 text, c5 text);
insert into data5 
select rpad('X',50,'X'),
       rpad('X',50,'X'),
       rpad('X',50,'X'),
       rpad('X',50,'X')
from generate_series(1,100000);

create table data10 (c1 text, c2 text, c3 text, c4 text, c5 text, c6 text, c7 text, c8 text, c9 text, c10 text);
insert into data10 
select rpad('X',50,'X'),
       rpad('X',50,'X'),
       rpad('X',50,'X'),
       rpad('X',50,'X'),
       rpad('X',50,'X'),
       rpad('X',50,'X'),
       rpad('X',50,'X'),
       rpad('X',50,'X'),
       rpad('X',50,'X'),
       rpad('X',50,'X')
from generate_series(1,100000);

The above creates two tables with 100.000 rows each. One with 5 columns, the other with 10.

When doing an explain analyze, the following will be returned (Postgres 9.3):

explain (analyze, buffers, verbose)
select count(*)
from data5;

Aggregate  (cost=4192.00..4192.01 rows=1 width=0) (actual time=27.539..27.539 rows=1 loops=1)
  Output: count(*)
  Buffers: shared hit=2942
  ->  Seq Scan on stuff.data5  (cost=0.00..3942.00 rows=100000 width=0) (actual time=0.005..16.158 rows=100000 loops=1)
        Output: c1, c2, c3, c4, c5
        Buffers: shared hit=2942
Total runtime: 27.595 ms

The line Buffers: shared hit=2942 tells us that Postgres had to look at 2942 blocks in order to read through the entire table.

Now for the table with 10 columns:

explain (analyze, buffers, verbose)
select count(*)
from data10;

Aggregate  (cost=7917.00..7917.01 rows=1 width=0) (actual time=34.964..34.965 rows=1 loops=1)
  Output: count(*)
  Buffers: shared hit=6667
  ->  Seq Scan on stuff.data10  (cost=0.00..7667.00 rows=100000 width=0) (actual time=0.010..22.187 rows=100000 loops=1)
        Output: c1, c2, c3, c4, c5, c6, c7, c8, c9, c10
        Buffers: shared hit=6667
Total runtime: 35.025 ms

We can see that Postgres had to look at 6667 blocks to get the count.

A similar thing can be observed using Oracle's SQL*Plus and the set autotrace statistics option that will also show the amount of (logical) I/O that was performed.

The statistics for the data5 table would look something like this:

VALUE | STATISTIC                             
------+---------------------------------------
  140 | bytes received via SQL*Net from client
  755 | bytes sent via SQL*Net to client      
 2977 | consistent gets                       
    0 | db block gets                         
    0 | physical reads                        
    0 | recursive calls                       
    0 | redo size                             
 2977 | session logical reads                 
    1 | sorts (memory)                        
    2 | SQL*Net roundtrips to/from client     
    1 | rows processed                        

The "consistent gets" denote the amount of logical I/O:

For the data10 table the output is as follows:

VALUE | STATISTIC                             
------+---------------------------------------
  141 | bytes received via SQL*Net from client
  615 | bytes sent via SQL*Net to client      
 7184 | consistent gets                       
    0 | db block gets                         
    0 | physical reads                        
    0 | recursive calls                       
    0 | redo size                             
 7184 | session logical reads                 
    1 | sorts (memory)                        
    2 | SQL*Net roundtrips to/from client     
    1 | rows processed                        

Again we clearly see the increase in (logical) I/O

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM