简体   繁体   English

为什么mysql内部联接查询需要这么多时间

[英]why does mysql inner join query take so much time

in mysql i have two tables 在MySQL中,我有两个表

tableA TABLEA

col1   col2  SIM1 ..........col24
-----------------------------------
a       x     1             5 
b       y     1             3
c       z     0             2
d       g     2             1

tableB tableB的

colA   colB   SIM2
-------------------
x       g     1
y       f     0
x       s     0
y       e     2

Actually the number of records in the two tables in 0.4 million 实际上两个表中的记录数为40万

i have a java program from which i am executing sql query using jdbc. 我有一个Java程序,正在使用jdbc从中执行sql查询。

here is the query 这是查询

     SELECT * 
      FROM TableA 
INNER JOIN TableB ON TableA.SIM1 =  TableB.SIM2 
INTO OUTFILE 'c:/test12226.csv' "+ 
FIELDS TERMINATED BY ',' 
ENCLOSED BY '\"'  
LINES TERMINATED BY '\n' 

This query is taking a really long time. 此查询需要很长时间。 for my application to be feasible this should not take more than 30 seconds. 为了使我的应用程序可行,此过程不应超过30秒。 i understand the records are 0.4 million but such an operation in ms access takes less than 10 seconds. 我了解记录是40万,但这样的ms访问操作只需不到10秒。 is java-mysql combination more time consuming than ms-access java-mysql组合比ms-access更耗时吗

i have allocated 1GB ram in debug configuration. 我在调试配置中分配了1GB内存。 please suggest. 请提出建议。

My guess is that one or both of TableA.SIM1 and TableB.SIM2 aren't indexed. 我的猜测是TableA.SIM1TableB.SIM2中的一个或两个都没有索引。 Either that or they're different data types (eg VARCHAR and NUMERIC ). 要么它们是不同的数据类型(例如VARCHARNUMERIC )。 Try: 尝试:

CREATE INDEX index_name1 ON TableA (SIM1);
CREATE INDEX index_name2 ON TableB (SIM2);

Without indexes that query will be really slow. 没有索引,查询将非常缓慢。 One table will be accessed record by record, which is fine since you're outputting the whole table. 将逐条记录访问一个表,这很好,因为您要输出整个表。 To find the corresponding record in the other table it needs to look up based on the SIM1 = SIM2 relationship. 为了在另一个表中找到对应的记录,需要根据SIM1 = SIM2关系进行查找。

To find records in the other table without an index it has to look through every record. 要在没有索引的其他表中查找记录,它必须浏览每条记录。 This is a linear or O(n) lookup. 这是线性或O(n)查找。 Put half a million records in each table and that's an awful lot of comparisons required to find all the matches (billions in facts). 在每张表中放入50万条记录,要查找所有匹配项,实际上需要进行大量比较(实际上是10亿条)。

With the indexes the record matching is near-instant. 使用索引,记录匹配几乎是即时的。

Think of it this way: indexing the columns is like putting a telephone book in alphabetical order. 这样想:对列进行索引就像按字母顺序放置电话簿。 That makes it easy to find surnames. 这样就很容易找到姓氏。 If the telephone book wasn't sorted at all how long would it take you to find someone's phone number? 如果电话簿根本没有排序,那么您要花多长时间才能找到某人的电话号码?

Now multiply that by half a million. 现在乘以一百万。

在TableA.SIM1和TableB.SIM2上是否设置了索引?

When you are performing inner join between two tables containing 10000 rows each. 在两个包含10000行的表之间执行内部联接时。 It has to go through 10000*10000 rows (if the columns aren't indexed). 它必须经过10000 * 10000行(如果未对列进行索引)。 If you want them to be fast, you have to index TableA.SIM1 and TableB.SIM2. 如果希望它们速度很快,则必须索引TableA.SIM1和TableB.SIM2。 This will bring down the query execution time. 这将减少查询的执行时间。

To index use the following commands 要编制索引,请使用以下命令

create index on TableA (SIM1);
create index on TableB (SIM2);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM