简体   繁体   中英

Table size - MariaDB Columnstore Vs InnoDB

Every analysis I found on ColumnStore from MariaDB claims that it uses less disk space than regular engines like InnoDB, eg: https://www.percona.com/blog/2017/03/17/column-store-database-benchmarks-mariadb-columnstore-vs-clickhouse-vs-apache-spark/

But that was not what I found on my tests

CREATE TABLE `innodb_test` (id int, value1 bigint, value2 bigint, value3 bigint, value4 bigint, value5 bigint) ENGINE=innodb;

CREATE TABLE `columnstore_test` (id int COMMENT 'compression=2', value1 bigint COMMENT 'compression=2', value2 bigint COMMENT 'compression=2', value3 bigint COMMENT 'compression=2', value4 bigint COMMENT 'compression=2',value5 bigint COMMENT 'compression=2') ENGINE=columnstore;

Insert 1 million rows (5 columns) with value 0 into the tables:

INSERT INTO innodb_test
SELECT CONCAT(a1.id,a2.id,a3.id,a4.id,a5.id,a6.id),
0,0,0,0,0
from 
  (select 0 as id union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a1, 
  (select 0 as id union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a2,
  (select 0 as id union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a3,
  (select 0 as id union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a4,
  (select 0 as id union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a5,
  (select 0 as id union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a6;

INSERT INTO columnstore_test SELECT * FROM innodb_test;

The size of the columnstore table is bigger than the innoDB table:

call columnstore_info.table_usage(NULL, 'columnstore_test');
+--------------+------------------+-----------------+-----------------+-------------+
| TABLE_SCHEMA | TABLE_NAME       | DATA_DISK_USAGE | DICT_DISK_USAGE | TOTAL_USAGE |
+--------------+------------------+-----------------+-----------------+-------------+
| size_comp    | columnstore_test | 352.05 MB       | 0 Bytes         | 0 Bytes     |
+--------------+------------------+-----------------+-----------------+-------------+

SELECT table_name, (data_length + index_length) / (1024 * 1024) "Size in MB"  FROM information_schema.tables WHERE table_schema = schema() AND table_name = 'innodb_test';
+-------------+------------+
| table_name  | Size in MB |
+-------------+------------+
| innodb_test | 71.6094    |
+-------------+------------+

Also, if I create the table without compression the size is the same:

CREATE TABLE `columnstore_no_compression` (id int COMMENT 'compression=0', value1 bigint COMMENT 'compression=0', value2 bigint COMMENT 'compression=0', value3 bigint COMMENT 'compression=0', value4 bigint COMMENT 'compression=0',value5 bigint COMMENT 'compression=0') ENGINE=columnstore;

INSERT INTO columnstore_no_compression SELECT * FROM innodb_test;

call columnstore_info.table_usage(NULL, 'columnstore_no_compression');
+--------------+----------------------------+-----------------+-----------------+-------------+
| TABLE_SCHEMA | TABLE_NAME                 | DATA_DISK_USAGE | DICT_DISK_USAGE | TOTAL_USAGE |
+--------------+----------------------------+-----------------+-----------------+-------------+
| size_comp    | columnstore_no_compression | 352.00 MB       | 0 Bytes         | 0 Bytes     |
+--------------+----------------------------+-----------------+-----------------+-------------+

I'm using mariadb-columnstore-1.1.2-1 version

my.ini file:

[client]
port = 3306
socket          = /usr/local/mariadb/columnstore/mysql/lib/mysql/mysql.sock

[mysqld]
loose-server_audit_syslog_info = columnstore-1
port = 3306
socket          = /usr/local/mariadb/columnstore/mysql/lib/mysql/mysql.sock
datadir         = /ssd/mariadb/db
skip-external-locking
key_buffer_size = 512M
max_allowed_packet = 1M
table_cache = 512
sort_buffer_size = 4M
read_buffer_size = 4M
read_rnd_buffer_size = 16M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size = 0
thread_stack = 512K
lower_case_table_names=1
group_concat_max_len=512
sql_mode="ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
infinidb_compression_type=2
infinidb_stringtable_threshold=20
infinidb_local_query=0
infinidb_diskjoin_smallsidelimit=0
infinidb_diskjoin_largesidelimit=0
infinidb_diskjoin_bucketsize=100
infinidb_um_mem_limit=0
infinidb_use_import_for_batchinsert=1
infinidb_import_for_batchinsert_delimiter=7
basedir                         = /usr/local/mariadb/columnstore/mysql/
character-sets-dir              = /usr/local/mariadb/columnstore/mysql/share/charsets/
lc-messages-dir                 = /usr/local/mariadb/columnstore/mysql/share/
plugin_dir                      = /usr/local/mariadb/columnstore/mysql/lib/plugin
binlog_format=ROW
server-id = 1
log-bin=/usr/local/mariadb/columnstore/mysql/db/mysql-bin
relay-log=/usr/local/mariadb/columnstore/mysql/db/relay-bin
relay-log-index = /usr/local/mariadb/columnstore/mysql/db/relay-bin.index
relay-log-info-file = /usr/local/mariadb/columnstore/mysql/db/relay-bin.info
tmpdir          = /ssd/tmp/

[mysqldump]
quick
max_allowed_packet = 16M

[mysql]
no-auto-rehash

[isamchk]
key_buffer_size = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M

[myisamchk]
key_buffer_size = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M

[mysqlhotcopy]
interactive-timeout

Is that the expected behavior or am I doing something wrong?

I'm the lead software engineer for MariaDB ColumnStore.

ColumnStore is optimised for large data sets and pre-allocates disk space for columns. The advantage of this is that on disk spindles there is less chance of fragmentation. The downside is on small data sets such as yours it has a lot of unused space allocated.

It starts off by pre-allocating 256KB for the first column extent and then extends this to 2^23 rows (just over 8 million). So for each of your BIGINT columns it will pre-allocate 64MB, for your INT it would pre-allocate 32MB. The small different between the compressed/uncompressed it for the header blocks on the compressed files. We have some information_schema tables that can show you real usage (to within 8KB):

https://mariadb.com/kb/en/library/columnstore-information-schema-tables/

So, unless you plan to use a much larger data set (at least in the several GB range) unfortunately you will see large disk usage when there is little data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM