简体   繁体   English

适用于JOIN大表的MySQL查询优化

[英]MySQL Query Optimization for JOIN Large Tables

I have a problem with my MySQL query with large data access, When the query optimized with join it gives the output within 122 seconds for the data of one week. 我的MySQL查询具有大数据访问权限时遇到问题,当使用join优化查询时,它会在122秒内为一周的数据提供输出。 Then for one month data it takes 526 seconds for the process. 然后,对于一个月的数据,此过程需要526秒。 I want to optimize this query for less amount of process time per year or if there any way to optimize MySQL settings in general ? 我想以每年更少的处理时间来优化此查询,或者是否有一般方法可以优化MySQL设置?

Table details. 表详细信息。 I refer two tables which mdiaries and tv_diaries,In both tables I have indexed relevant columns, In mdiaries table there are 2661331 rows and 27074645 rows in tv_diaries. 我引用了两个表mdiaries和tv_diaries,在两个表中我都索引了相关列,在mdiaries表中tv_diaries中有2661331行和27074645行。

mdiaries table: mdiaries表:

  INDEX area (area),
  INDEX date (date),
  INDEX district (district),
  INDEX gaDivision (gaDivision),
  INDEX member_id (member_id),
  INDEX tv_channel_id (tv_channel_id),

tv_diaries. tv_diaries。

  INDEX area (area),
  INDEX date (date),
  INDEX district (district),
  INDEX member_id (member_id),
  INDEX timeslot_id (timeslot_id),
  INDEX tv_channel_id (tv_channel_id),

This is my query which takes 122 seconds to execute. 这是我的查询,需要122秒才能执行。

$sql = "SELECT COUNT(TvDiary.id) AS m_count,TvDiary.date,TvDiary.timeslot_id,TvDiary.tv_channel_id,TvDiary.district,TvDiary.area
FROM `mdiaries` AS Mdiary INNER JOIN `tv_diaries` AS TvDiary ON Mdiary.member_id = TvDiary.member_id
WHERE Mdiary.date >= '2014-01-01' AND Mdiary.date <= '2014-01-07'
AND TvDiary.date >= '2014-01-01' AND TvDiary.date <= '2014-01-07'
GROUP BY TvDiary.date,
TvDiary.timeslot_id,
TvDiary.tv_channel_id,
TvDiary.district,
TvDiary.area";

This is my.cnf file. 这是my.cnf文件。

    [mysqld]

## General
datadir                         = /var/lib/mysql
tmpdir                          = /var/lib/mysqltmp
socket                          = /var/lib/mysql/mysql.sock
skip-name-resolve
sql-mode                        = NO_ENGINE_SUBSTITUTION
#event-scheduler                = 1

## Networking
back-log                        = 100
#max-connections                = 200
max-connect-errors              = 10000
max-allowed-packet              = 32M
interactive-timeout             = 3600
wait-timeout                    = 600

### Storage Engines
#default-storage-engine         = InnoDB
innodb                          = FORCE

## MyISAM
key-buffer-size                 = 64M
myisam-sort-buffer-size         = 128M

## InnoDB
innodb-buffer-pool-size        = 16G
innodb_buffer_pool_instances    = 16
#innodb-log-file-size           = 100M
#innodb-log-buffer-size         = 8M
#innodb-file-per-table          = 1
#innodb-open-files              = 300

## Replication
server-id                       = 1
#log-bin                        = /var/log/mysql/bin-log
#relay-log                      = /var/log/mysql/relay-log
relay-log-space-limit           = 16G
expire-logs-days                = 7
#read-only                      = 1
#sync-binlog                    = 1
#log-slave-updates              = 1
#binlog-format                  = STATEMENT
#auto-increment-offset          = 1
#auto-increment-increment       = 2

## Logging
log-output                      = FILE
slow-query-log                  = 1
slow-query-log-file             = /var/log/mysql/slow-log
#log-slow-slave-statements
long-query-time                 = 2

##
query_cache_size        = 512M
query_cache_type        = 1
query_cache_limit       = 2M
join_buffer_size        = 512M
thread_cache_size       = 128

[mysqld_safe]
log-error                       = /var/log/mysqld.log
open-files-limit                = 65535

[mysql]
no-auto-rehash

This is your query: 这是您的查询:

SELECT COUNT(t.id) AS m_count, t.date, t.timeslot_id, t.tv_channel_id,
       t.district, t.area
FROM `mdiaries` m INNER JOIN
     `tv_diaries` t
     ON m.member_id = t.member_id
WHERE m.date >= '2014-01-01' AND m.date <= '2014-01-07' AND
      t.date >= '2014-01-01' AND t.date <= '2014-01-07'
GROUP BY t.date, t.timeslot_id, t.tv_channel_id, t.district, t.area;

I would start with composite indexes: tv_diaries(date, member_id) and mdiaries(member_id, date) . 我将从复合索引开始: tv_diaries(date, member_id)mdiaries(member_id, date)

This query is problematic, but these might help. 此查询有问题,但这些可能会有所帮助。

文档所述尝试在GROUP BY子句中引用的所有列上添加多列索引。

INDEX grp (date, timeslot_id, tv_channel_id, district, area)

Not sure but it can provide you better performance- 不确定,但是它可以为您提供更好的性能-

SELECT COUNT(t.id) AS m_count, t.date, t.timeslot_id, t.tv_channel_id, t.district, t.area
FROM `mdiaries` m 
JOIN 
(
SELECT t.id, t.date, t.timeslot_id, t.tv_channel_id, t.district, t.area, t.member_id 
FROM `tv_diaries` AS t
WHERE t.date >= '2014-01-01' AND t.date <= '2014-01-07' 
) t ON m.member_id = t.member_id
WHERE m.date >= '2014-01-01' AND m.date <= '2014-01-07' 
GROUP BY t.date, t.timeslot_id, t.tv_channel_id, t.district, t.area;

You can also check your db config setting as I am seeing below issues- 您还可以检查数据库配置设置,因为我看到以下问题-

  1. innodb_file_per_table=1 is commented: if it is true then data will be stored in single ibd file instead of table wise. 注释了innodb_file_per_table = 1:如果为true,则数据将存储在单个ibd文件中,而不是逐表存储。

  2. tmp_table_size and max_heap_table_size can improve performance as you are trying to fetch data from heavy tables. 尝试从重表中获取数据时,tmp_table_size和max_heap_table_size可以提高性能。 so try to set both of them as at least 100M to avoid temp table creation on disk if your query is creating temp table on disk. 因此,如果查询正在磁盘上创建临时表,请尝试将它们都设置为至少100M,以避免在磁盘上创建临时表。

  3. as you are using group by, so sort_buffer_size variable can help if you increase it. 当您使用分组方式时,如果增加它,sort_buffer_size变量可以提供帮助。 can set 2M. 可以设置2M。

  4. join_buffer_size is too high it should be near about 2M can set max. join_buffer_size太高,应该接近2M才能设置最大值。 8M but not 512M as it used session wise so eat all your memory. 8M,但不是512M,因为它在会话中使用得很好,所以请占用所有内存。

  5. also you have set query_cache_size too high as 512M, so free memory from here, you can also check by mysqltuner report that actually you are getting benefit of caching query or not if not then you can disable it. 同样,您已将query_cache_size设置得太高,无法达到512M,因此从此处释放可用内存,还可以通过mysqltuner报告检查实际上是否在使用缓存查询,如果没有,则可以禁用它。

Maybe you could use a materialized view to store the result of the query and refresh it periodically (monthly? 15 days?) 也许您可以使用实例化视图来存储查询结果并定期刷新(每月还是15天?)

This will not optimize your query but your consults will be way faster (It won't calculate again the count) 这不会优化您的查询,但是您的咨询会更快(它不会再次计算计数)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM