简单的MySQL选择查询需要4个小时

Question

I'd be grateful if you could help with a novice question. 如有新手帮助，我将不胜感激。 I'm apply the following SQL: 我应用以下SQL：

INSERT INTO t03_hesid_history(uniqueID, hes_data_all_years.extract_hesid, FIELD1, FIELD2)  
SELECT uniqueID, hes_data_all_years.extract_hesid, FIELD1, FIELD2  
FROM hes_data_all_years  
INNER JOIN T02_hesid_grouped  
ON hes_data_all_years.extract_hesid = T02_hesid_grouped.extract_hesid;

The hes_data_all_years table has 188 million records and the T02_hesid_grouped table has 80,000 records. hes_data_all_years表具有1.88亿条记录，而T02_hesid_grouped表具有80,000条记录。 The T02_hesid_grouped table has a single (unique) field called extract_hesid which is indexed. T02_hesid_grouped表具有一个名为extract_hesid的单个（唯一）字段，该字段已建立索引。 The hes_data_all_years has many fields and a single index on the extract_hesid field that is being joined. hes_data_all_years具有许多字段，并且正在联接的extract_hesid字段上有一个索引。

The query aims to extract all records in hes_data_all_years with a match in the T02_hesid_grouped field. 该查询旨在提取hes_data_all_years中的所有记录，并在T02_hesid_grouped字段中匹配。 I expect the output to provide 1-2m records. 我希望输出提供1-2m条记录。

The query takes approximately 4 hours... 查询大约需要4个小时...

Is the length of time due to the dataset size or is there some optimization that could be carried out? 时间的长短是由于数据集的大小引起的，还是可以进行一些优化？ Many thanks!! 非常感谢！！

EXPLAIN outputon the SELECT part is shown below: SELECT部分的EXPLAIN输出如下所示：

1   SIMPLE  T02_hesid_grouped   index   I_HESID I_HESID 43      79824   Using index
1   SIMPLE  hes_data_all_years  ref I_HESID I_HESID 43  hes.T02_hesid_grouped.extract_hesid 1   Using where

Answer 1

This could be a performance problem with generating the resultset or with inserting it into the destination table. 生成结果集或将其插入到目标表中可能是性能问题。

Ordinarily one doesn't do SELECT * for a resultset that's being used for an insert, but rather names the columns to select in the same order as the fields into the destination table. 通常，对于要用于插入的结果集，不执行SELECT * ，而是以与目标表中的字段相同的顺序命名要选择的列。 Your resultset has two columns named extract_hesid . 您的结果集中有两列名为extract_hesid 。 It seems unlikely that's what you want. 看来这不是您想要的。

What is the value of hes_data_all_years.extract_hesid for the rows in hes_data_all_years that don't match rows in T02_hesid_grouped ? 什么是价值hes_data_all_years.extract_hesid的行中hes_data_all_years不匹配中的行T02_hesid_grouped ？ Things will be faster if those values aren't NULL. 如果这些值不为NULL，事情将会更快。

Are your tables, especially the destination table, using MyISAM? 您的表（尤其是目标表）是否使用MyISAM？ Things will be faster if they are because InnoDB is transaction oriented, and has to generate rollback data while it's doing that INSERT of a couple of megarows. 如果这样做是因为InnoDB是面向事务的，并且必须在执行INSERT数百万行的同时生成回滚数据，那么事情将会更快。

188 megarows isn't small, and your elapsed time isn't totally outrageous. 188兆行并不小，您经过的时间也不尽人意。 It is long, but not absurdly so. 这很长，但并非荒唐。 You may want to check that your MySQL server has enough RAM. 您可能要检查您的MySQL服务器是否有足够的RAM。 Or, if this is a yearly or one time thing, you may want to simply declare victory and move on. 或者，如果这是一年一次或一次的事情，那么您可能只想简单地宣布胜利并继续前进。

简单的MySQL选择查询需要4个小时

问题描述

1 个解决方案

解决方案1
0 2013-02-04 23:36:38

简单的MySQL选择查询需要4个小时

问题描述

1 个解决方案

解决方案1 0 2013-02-04 23:36:38

解决方案1
0 2013-02-04 23:36:38