简体   繁体   中英

why does mysql inner join query take so much time

in mysql i have two tables

tableA

col1   col2  SIM1 ..........col24
-----------------------------------
a       x     1             5 
b       y     1             3
c       z     0             2
d       g     2             1

tableB

colA   colB   SIM2
-------------------
x       g     1
y       f     0
x       s     0
y       e     2

Actually the number of records in the two tables in 0.4 million

i have a java program from which i am executing sql query using jdbc.

here is the query

     SELECT * 
      FROM TableA 
INNER JOIN TableB ON TableA.SIM1 =  TableB.SIM2 
INTO OUTFILE 'c:/test12226.csv' "+ 
FIELDS TERMINATED BY ',' 
ENCLOSED BY '\"'  
LINES TERMINATED BY '\n' 

This query is taking a really long time. for my application to be feasible this should not take more than 30 seconds. i understand the records are 0.4 million but such an operation in ms access takes less than 10 seconds. is java-mysql combination more time consuming than ms-access

i have allocated 1GB ram in debug configuration. please suggest.

My guess is that one or both of TableA.SIM1 and TableB.SIM2 aren't indexed. Either that or they're different data types (eg VARCHAR and NUMERIC ). Try:

CREATE INDEX index_name1 ON TableA (SIM1);
CREATE INDEX index_name2 ON TableB (SIM2);

Without indexes that query will be really slow. One table will be accessed record by record, which is fine since you're outputting the whole table. To find the corresponding record in the other table it needs to look up based on the SIM1 = SIM2 relationship.

To find records in the other table without an index it has to look through every record. This is a linear or O(n) lookup. Put half a million records in each table and that's an awful lot of comparisons required to find all the matches (billions in facts).

With the indexes the record matching is near-instant.

Think of it this way: indexing the columns is like putting a telephone book in alphabetical order. That makes it easy to find surnames. If the telephone book wasn't sorted at all how long would it take you to find someone's phone number?

Now multiply that by half a million.

在TableA.SIM1和TableB.SIM2上是否设置了索引?

When you are performing inner join between two tables containing 10000 rows each. It has to go through 10000*10000 rows (if the columns aren't indexed). If you want them to be fast, you have to index TableA.SIM1 and TableB.SIM2. This will bring down the query execution time.

To index use the following commands

create index on TableA (SIM1);
create index on TableB (SIM2);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM