简体   繁体   中英

MariaDb10 (MySQL 5.6) - random high I/O and slow queries after update from MySQL 5.5

We run a Java web application using a MySQL database and connecting using the 5.1 MySQL connector with c3p0 connection pools. The data collected during the day is transformed into a data warehouse overnight. The application and database both live on the same EC2 instance (m3.medium) with 4GB of RAM which uses gp2 (SSD) EBS volumes.

Upgrading the database from MySQL 5.5 to MariaDB 10.0.17 (also tried MySQL 5.6.23) causes the data warehouse nightly build to take an unreasonably long time under certain conditions.

The data warehouse build is based on a Maven/Ant set of scripts that run SQL queries and JRuby scripts. The data warehouse (DWH) build is executed by a CRON job overnight and the application is not stopped while the DWH build runs. The DWH build first drops the whole dwh schema and then rebuilds it.

Running the DWH build under these conditions against MySQL 5.5 takes approximatively 4 hours, while it sometimes takes up to 16 hours against MariaDB 10 (and Mysql 5.6) for the same data. However, this behavior is not consistent and sometimes the DWH builds in 4 hours using MariaDB 10.

These tests were run in test environments where the Java application is simply connected to the database without making any query to it, so it doesn't contribute in any way to the DB load.

In particular two queries in the DWH build process sometimes takes a lot longer to run (total of 32K seconds versus 4K seconds).

When the DWH gets "stuck" on these queries, I/O against ibdata1 increases and MariaDB uses over 90% of the I/O (based on iotop).

I've spent a lot of time testing different MariaDB configurations (changing memory allocated to MariaDB, # of threads, etc.) but the outcome is always the same.

Has anybody else experienced similar issues when upgrading from MySQL 5.5 to 5.6? Any ideas on what else to profile to try identify what the problem might be?

Here you can also find charts for the disk I/O usage (Ops/s). On 3/24 and 3/26 the DWH ran correctly (4 hours), on 3/25 it took four times as much (16 hours).

Reads: Reads Ops/s

Writes: Writes Ops/s

As you can see the I/O profiles are completely different even tho the data and processes are the same, as if MariaDB was doing some extra work.

Thanks!

Below some information I have collected while one of the queries is running slowly.

show full processlist (query 57 is the DWH build process, everything else belongs to the Java application):

| Id | User  | Host            | db   | Command | Time  | State        | Info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | Progress |
| 50 | steve | localhost:37029 | qrtz | Sleep   |     3 |              | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    0.000 |
| 51 | steve | localhost:37030 | qrtz | Sleep   |     3 |              | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    0.000 |
| 52 | steve | localhost:37031 | qrtz | Sleep   |     3 |              | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    0.000 |
| 57 | steve | localhost:37366 | dwh  | Query   | 14222 | Sending data | INSERT INTO dwh.assessment_answer_fact ( page_id, page_number, category_id, category_number, category_title, question_id, question_display_number, question_number, question_text, question_identifier, answer_value, assessment_fact_id ) SELECT p.id, p.page_number, ic.id, ic.category_number, ic.title, q.id, q.question_display_number, q.question_number, q.question_text, q.question_identifier, av.value, ca.id FROM smr.cans_assessment ca join survey.surveys s ON ( s.id = ca.survey_id ) join survey.instruments i ON ( s.instrument_id = i.id ) join survey.survey_categories   sc ON ( sc.survey_id = s.id ) join survey.instrument_categories ic ON ( sc.instrument_category_id = ic.id ) join survey.pages p ON ( p.id = ic.page_id ) join survey.answers a ON ( a.survey_category_id = sc.id ) join survey.answer_values av ON ( av.answer_id = a.id ) join survey.questions q ON ( a.question_id = q.id ) WHERE ca.id in (select af.assessment_id from dwh.assessment_fact af) |    0.000 |
| 64 | steve | localhost:37700 | NULL | Sleep   |  5394 |              | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    0.000 |
| 65 | steve | localhost:37701 | NULL | Sleep   |  5394 |              | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    0.000 |
| 66 | steve | localhost:37702 | NULL | Sleep   |  5394 |              | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    0.000 |
| 67 | steve | localhost:37703 | NULL | Sleep   |  5394 |              | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    0.000 |
| 68 | steve | localhost:37770 | NULL | Query   |     0 | init         | show full processlist                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |    0.000 |

show engine innodb status :

-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 28198 srv_active, 0 srv_shutdown, 35270 srv_idle
srv_master_thread log flush and writes: 63465
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 247509
OS WAIT ARRAY INFO: signal count 230685
Mutex spin waits 233790, rounds 7015555, OS waits 233481
RW-shared spins 5789, rounds 173633, OS waits 5403
RW-excl spins 3842, rounds 258871, OS waits 7270
Spin rounds per wait: 30.01 mutex, 29.99 RW-shared, 67.38 RW-excl
------------
TRANSACTIONS
------------
Trx id counter 2795813
Purge done for trx's n:o < 2795733 undo n:o < 0 state: running but idle
History list length 327
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 68, OS thread handle 0x7fa659961700, query id 135532 localhost 127.0.0.1 steve init
show engine innodb status
---TRANSACTION 2773754, not started
MySQL thread id 52, OS thread handle 0x7fa6923aa700, query id 135518 localhost 127.0.0.1 steve cleaning up
---TRANSACTION 2795812, not started
MySQL thread id 51, OS thread handle 0x7fa6923f3700, query id 135531 localhost 127.0.0.1 steve cleaning up
---TRANSACTION 2773251, not started
MySQL thread id 1, OS thread handle 0x7fa6924ce700, query id 0 Waiting for background binlog tasks
---TRANSACTION 2794864, ACTIVE 14118 sec fetching rows
mysql tables in use 11, locked 11
136069 lock struct(s), heap size 20477480, 23704803 row lock(s), undo log entries 3791652
MySQL thread id 57, OS thread handle 0x7fa69243c700, query id 129742 localhost 127.0.0.1 steve Sending data
INSERT INTO dwh.assessment_answer_fact ( page_id, page_number, category_id, category_number, category_title, question_id, question_display_number, question_number, question_text, question_identifier, answer_value, assessment_fact_id ) SELECT p.id, p.page_number, ic.id, ic.category_number, ic.title, q.id, q.question_display_number, q.question_number, q.question_text, q.question_identifier, av.value, ca.id FROM smr.cans_assessment ca join survey.surveys s ON ( s.id = ca.survey_id ) join survey.instruments i ON ( s.instrument_id = i.id ) join survey.survey_categories   sc ON ( sc.survey_id = s.id
Trx #rec lock waits 0 #table lock waits 0
Trx total rec lock wait time 0 SEC
Trx total table lock wait time 0 SEC
TABLE LOCK table `survey`.`questions` trx id 2794864 lock mode IS lock hold time 14118 wait time before grant 0
RECORD LOCKS space id 852 page no 9 n bits 184 index `PRIMARY` of table `survey`.`questions` trx table locks 12 total table locks 1  trx id 2794864 lock mode S lock hold time 14118 wait time before grant 0
TABLE LOCK table `survey`.`answers` trx id 2794864 lock mode IS lock hold time 14118 wait time before grant 0
RECORD LOCKS space id 861 page no 12 n bits 824 index `FKCD7DB87560DC6F04` of table `survey`.`answers` trx table locks 12 total table locks 1  trx id 2794864 lock mode S lock hold time 14118 wait time before grant 0
RECORD LOCKS space id 861 page no 6 n bits 336 index `PRIMARY` of table `survey`.`answers` trx table locks 12 total table locks 1  trx id 2794864 lock mode S locks rec but not gap lock hold time 14118 wait time before grant 0
TABLE LOCK table `survey`.`survey_categories` trx id 2794864 lock mode IS lock hold time 14118 wait time before grant 0
RECORD LOCKS space id 855 page no 7 n bits 328 index `PRIMARY` of table `survey`.`survey_categories` trx table locks 12 total table locks 1  trx id 2794864 lock mode S locks rec but not gap lock hold time 14118 wait time before grant 0
TABLE LOCK table `smr`.`cans_assessment` trx id 2794864 lock mode IS lock hold time 14118 wait time before grant 0
RECORD LOCKS space id 1800 page no 42 n bits 824 index `FK52F1DDEAF118504` of table `smr`.`cans_assessment` trx table locks 12 total table locks 1  trx id 2794864 lock mode S locks gap before rec lock hold time 14118 wait time before grant 0
RECORD LOCKS space id 861 page no 32512 n bits 336 index `PRIMARY` of table `survey`.`answers` trx table locks 12 total table locks 1  trx id 2794864 lock mode S locks rec but not gap lock hold time 14118 wait time before grant 0
TOO MANY LOCKS PRINTED FOR THIS TRX: SUPPRESSING FURTHER PRINTS
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
29827256 OS file reads, 1144210 OS file writes, 345551 OS fsyncs
631.07 reads/s, 16384 avg bytes/read, 20.61 writes/s, 9.37 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 858, free list len 5697, seg size 6556, 299256 merges
merged operations:
 insert 7839828, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
2106.29 hash searches/s, 2190.81 non-hash searches/s
---
LOG
---
Log sequence number 184023146842
Log flushed up to   184023103841
Pages flushed up to 184006513482
Last checkpoint at  184006296459
Max checkpoint age    216721613
Checkpoint age target 209949063
Modified age          16633360
Checkpoint age        16850383
0 pending log writes, 0 pending chkp writes
42065 log i/o's done, 1.22 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 549453824; in additional pool allocated 0
Total memory allocated by read views 512
Internal hash tables (constant factor + variable factor)
    Adaptive hash index 10673872    (8851048 + 1822824)
    Page hash           553976 (buffer pool 0 only)
    Dictionary cache    4009988     (2214224 + 1795764)
    File system         914416  (812272 + 102144)
    Lock system         21808096    (1329176 + 20478920)
    Recovery system     0   (0 + 0)
Dictionary memory allocated 1795764
Buffer pool size        32767
Buffer pool size, bytes 536854528
Free buffers            965
Database pages          30442
Old database pages      11252
Modified db pages       1638
Percent of dirty pages(LRU & free pages): 5.215
Max dirty pages percent: 75.000
Pending reads 1
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 687161, not young 141371905
4.33 youngs/s, 3024.61 non-youngs/s
Pages read 29827185, created 180475, written 972592
631.09 reads/s, 4.10 creates/s, 15.67 writes/s
Buffer pool hit rate 952 / 1000, young-making rate 0 / 1000 not 232 / 1000
Pages read ahead 485.66/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 30442, unzip_LRU len: 0
I/O sum[8334]:cur[77], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
1 RW transactions active inside InnoDB
0 RO transactions active inside InnoDB
1 out of 1000 descriptors used
Main thread process no. 7359, id 140352388314880, state: sleeping
Number of rows inserted 7172945, updated 106861, deleted 0, read 1363672517
240.68 inserts/s, 0.00 updates/s, 0.00 deletes/s, 1666.92 reads/s
Number of system rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

Metadata locks obtained from SELECT * FROM information_schema.metadata_lock_info (METADATA_LOCK_INFO plugin):

+-----------+-------------------------+-----------------+---------------------+--------------+------------------------+
| THREAD_ID | LOCK_MODE               | LOCK_DURATION   | LOCK_TYPE           | TABLE_SCHEMA | TABLE_NAME             |
+-----------+-------------------------+-----------------+---------------------+--------------+------------------------+
|        57 | MDL_INTENTION_EXCLUSIVE | MDL_STATEMENT   | Global read lock    |              |                        |
|        57 | MDL_SHARED_READ         | MDL_TRANSACTION | Table metadata lock | survey       | instrument_categories  |
|        57 | MDL_SHARED_READ         | MDL_TRANSACTION | Table metadata lock | survey       | questions              |
|        57 | MDL_SHARED_READ         | MDL_TRANSACTION | Table metadata lock | survey       | pages                  |
|        57 | MDL_SHARED_READ         | MDL_TRANSACTION | Table metadata lock | survey       | answer_values          |
|        57 | MDL_SHARED_READ         | MDL_TRANSACTION | Table metadata lock | survey       | surveys                |
|        57 | MDL_SHARED_READ         | MDL_TRANSACTION | Table metadata lock | dwh          | assessment_fact        |
|        57 | MDL_SHARED_READ         | MDL_TRANSACTION | Table metadata lock | survey       | answers                |
|        57 | MDL_SHARED_READ         | MDL_TRANSACTION | Table metadata lock | smr          | cans_assessment        |
|        57 | MDL_SHARED_WRITE        | MDL_TRANSACTION | Table metadata lock | dwh          | assessment_answer_fact |
|        57 | MDL_SHARED_READ         | MDL_TRANSACTION | Table metadata lock | survey       | survey_categories      |
|        57 | MDL_SHARED_READ         | MDL_TRANSACTION | Table metadata lock | survey       | instruments            |
+-----------+-------------------------+-----------------+---------------------+--------------+------------------------+

Files I/O from pt-ioprofile -c sizes (Percona Toolkit):

     total      pread     pwrite      fsync filename
  72269824   72269824          0          0 /data/mysql/smr/survey/answer_values.ibd
   8323072     393216    7929856          0 /data/mysql/smr/ibdata1
   4964352     770048    4194304          0 /data/mysql/smr/dwh/assessment_answer_fact.ibd
   4833280    4833280          0          0 /data/mysql/smr/survey/answers.ibd
   2707456          0    2707456          0 /data/mysql/smr/ib_logfile1
   1572864    1572864          0          0 /data/mysql/smr/dwh/assessment_fact.ibd
   1523712    1523712          0          0 /data/mysql/smr/survey/surveys.ibd
   1114112    1114112          0          0 /data/mysql/smr/survey/survey_categories.ibd
    163840     163840          0          0 /data/mysql/smr/smr/cans_assessment.ibd
     49152      49152          0          0 /data/mysql/smr/survey/questions.ibd
         0          0          0          0 /data/mysql/smr/ib_logfile0

Some MariaDB configuration parameters (show global variables is too long to fit in message):

log_bin
binlog_format           = mixed
max_binlog_size         = 1073741824
binlog_cache_size       = 4M
innodb_buffer_pool_size         = 512M
innodb_io_capacity              = 100
query_cache_size                = 64M
innodb_flush_neighbors          = 0
innodb_flush_log_at_trx_commit = 0
tmp_table_size          = 32M
innodb_log_file_size    = 128M
innodb_log_buffer_size  = 8M
max_allowed_packet      = 16M
innodb_file_per_table   = 1

Slow query :

INSERT INTO dwh.assessment_answer_fact ( page_id, page_number, category_id, category_number, category_title, question_id, question_display_number, question_number, question_text, question_identifier, answer_value, assessment_fact_id ) SELECT p.id, p.page_number, ic.id, ic.category_number, ic.title, q.id, q.question_display_number, q.question_number, q.question_text, q.question_identifier, av.value, ca.id FROM smr.cans_assessment ca join survey.surveys s ON ( s.id = ca.survey_id ) join survey.instruments i ON ( s.instrument_id = i.id ) join survey.survey_categories   sc ON ( sc.survey_id = s.id ) join survey.instrument_categories ic ON ( sc.instrument_category_id = ic.id ) join survey.pages p ON ( p.id = ic.page_id ) join survey.answers a ON ( a.survey_category_id = sc.id ) join survey.answer_values av ON ( av.answer_id = a.id ) join survey.questions q ON ( a.question_id = q.id ) WHERE ca.id in (select af.assessment_id from dwh.assessment_fact af)

Explain of slow query :

+------+-------------+-------+--------+-----------------------------------------------+--------------------+---------+----------------------------------+------+-------------+
| id   | select_type | table | type   | possible_keys                                 | key                | key_len | ref                              | rows | Extra       |
+------+-------------+-------+--------+-----------------------------------------------+--------------------+---------+----------------------------------+------+-------------+
|    1 | PRIMARY     | q     | ALL    | PRIMARY                                       | NULL               | NULL    | NULL                             |  384 |             |
|    1 | PRIMARY     | a     | ref    | PRIMARY,FKCD7DB8757DA6A899,FKCD7DB87560DC6F04 | FKCD7DB87560DC6F04 | 8       | survey.q.id                      |  321 |             |
|    1 | PRIMARY     | sc    | eq_ref | PRIMARY,FKBEB7B82168B10433,FKBEB7B821AF118504 | PRIMARY            | 8       | survey.a.survey_category_id      |    1 |             |
|    1 | PRIMARY     | ca    | ref    | PRIMARY,FK52F1DDEAF118504                     | FK52F1DDEAF118504  | 8       | survey.sc.survey_id              |    1 | Using index |
|    1 | PRIMARY     | af    | eq_ref | PRIMARY                                       | PRIMARY            | 8       | smr.ca.id                        |    1 | Using index |
|    1 | PRIMARY     | s     | eq_ref | PRIMARY,FK91914459785448E4                    | PRIMARY            | 8       | survey.sc.survey_id              |    1 |             |
|    1 | PRIMARY     | i     | eq_ref | PRIMARY                                       | PRIMARY            | 8       | survey.s.instrument_id           |    1 | Using index |
|    1 | PRIMARY     | ic    | eq_ref | PRIMARY,FK1118917415F49764                    | PRIMARY            | 8       | survey.sc.instrument_category_id |    1 |             |
|    1 | PRIMARY     | av    | ref    | FK5075C8C38297FA84                            | FK5075C8C38297FA84 | 8       | survey.a.id                      |    1 |             |
|    1 | PRIMARY     | p     | eq_ref | PRIMARY                                       | PRIMARY            | 8       | survey.ic.page_id                |    1 |             |
+------+-------------+-------+--------+-----------------------------------------------+--------------------+---------+----------------------------------+------+-------------+

Row counts :

| answer_values               |   10766276 |
| answers                     |   12020566 |
| answers_answer_values       |   11662057 |
| cans_assessment             |      77221 |
| instrument_categories       |         85 |
| instruments                 |         11 |
| pages                       |         11 |
| questions                   |        384 |
| survey_categories           |    1462377 |
| survey_categories_answers   |   10954877 |
| surveys                     |     118702 |
| surveys_survey_categories   |    1515111 |

| assessment_fact             |      76803 |
| assessment_answer_fact      |    7673695 |

The real problem is the SELECT :

SELECT  p.id, p.page_number, ic.id, ic.category_number, ic.title,
        q.id, q.question_display_number, q.question_number, q.question_text,
        q.question_identifier, av.value, ca.id
    FROM  smr.cans_assessment ca
    join  survey.surveys s ON ( s.id = ca.survey_id )
    join  survey.instruments i ON ( s.instrument_id = i.id )
    join  survey.survey_categories sc ON ( sc.survey_id = s.id )
    join  survey.instrument_categories ic ON ( sc.instrument_category_id = ic.id )
    join  survey.pages p ON ( p.id = ic.page_id )
    join  survey.answers a ON ( a.survey_category_id = sc.id )
    join  survey.answer_values av ON ( av.answer_id = a.id )
    join  survey.questions q ON ( a.question_id = q.id )
    WHERE  ca.id in (
        SELECT  af.assessment_id
            from  dwh.assessment_fact af
...

Please provide the entire SELECT (it was truncated in the output).

Please check that the JOINs are going through indexes.

Please provide the EXPLAIN SELECT ...

If possible turn the in ( SELECT ... ) into JOIN. (This may be the main performance problem!)

What version? Not just "10" and "5.6". (Need to see if it has the optimization for in ( SELECT ... ) . Old versions optimized this terribly.)

How much RAM? 0.5GB of buffer_pool is low unless you have a tiny machine.

Please explain the intent of the INSERT..SELECT .

What is the setting of innodb_file_per_table? (This relates to why there might be a lot of I/O on ibdata1.)

How big are the tables, including the one you are INSERTing into?

Edit 1

Assuming this is the entire trailing clause:

WHERE  ca.id in (
    SELECT  af.assessment_id
        from  dwh.assessment_fact af )

Would you see if removing the WHERE clause and adding this would speed up the query:

JOIN dwh.assessment_fact af ON af.assessment_id = ca.id

This 'Answer' focuses on

The DWH build first drops the whole dwh schema and then rebuilds it.

I suggest that is backwards. If you do the following, you have no downtime, hence no visible delay for that slow query:

  1. CREATE TABLE new_1 LIKE real_1 ; -- one per table you are rebuilding
  2. Load the data into the new tables.
  3. Do the nasty INSERT..SELECT in the new tables.
  4. RENAME TABLE real_1 TO old_1, new_1 TO real_1, real_2 TO old_2, ...; -- This is the only 'downtime', and it is instantaneous.
  5. DROP TABLE old_1; -- for each old table.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM