How can I optimize my query to run faster?

Question

User table has

id, email, password, gender, dob

etc. Gender is default null. I have another table user_gender which has first_name and gender . My sql query is getting the user from User and picking the gender from User_Gender based on first_name . The user table is huge with somewhere around 300,000+ rows. I am running the below mentioned query, but it is taking too much time. How do I optimize this query?-

select 
   count(*) 
from user u 
left outer join user_gender ug on ug.name = 
  case when locate(' ', u.name) > 0 then
     substring(u.name, 1,locate(' ', u.name))
  else
     u.name 
  end 
where 
  ug.gender != 'mf' and u.gender is null

Answer 1

First I advise a complete restructuring. 300000 rows are starting to hit the "medium data set" size...

normalize tables properly
- don't use columns that store more than one separate values - especially the name column is a fine example what not to d. Let that be two columns: first_name and last_name.
- DavidB mentioned the gender separation. This is total nonsense. At least, everyone has a Gender... Is unknown, it could always be NULL ...
use (preferably numeric!!) IDs, instead of using data fields (especiallz those like names)
- this way, if the name is changed (that can happen IRL), you have to update only one row...
- two people might even have completely the same name...

Secondly after the restructuring, you'll have to apply indexes, and check your queries' execution plans to be sure to optimize them appropriately.

Answer 2

Working on the design of these two tables first will help you better on solving your performance problem. The problem for the performance occurs on your join clause:

case when locate(' ', u.name)>0 then substring(u.name, 1,locate(' ', u.name)) else u.name end

Use a primary key (user_id) for your User table and have this on your user_gender table and join accordingly.

OR

Since perhaps you are using a legacy database design and cannot add or use user_id fields, you may use a temporary first_name field and fill it by using your join clause

update users u set u.first_name = case when locate(' ', u.name)>0 
then substring(u.name, 1,locate(' ',> u.name)) else u.name end

After this you may rewrite your query as

select count(*) from user u left outer join user_gender ug 
on ug.name=u.first_name 
where ug.gender != 'mf' and u.gender is null

This will help your query run faster but I would propose the first solution, adding/using primary keys anyway.

Answer 3

The first thing I note is that the query is not written correctly. Or, at least, it is not doing what you intend. The != in the where clause is "undoing" the left outer join. I think you want that in the on clause.

With an index on user_gender(firstname, gender) , I think this version should run pretty quickly:

select count(*) 
from (select u.*,
             (case when locate(' ', u.name) > 0 then substring(u.name, 1,locate(' ', u.name))
                   else u.name
              end) as FirstName
      from user u
     ) u
where not exists (select 1 from user_gender ug where ug.name = u.FirstName and ug.gender <> 'mf')

It should can the user table, calculate the first name, and check in the index to see if there is a gender.

How can I optimize my query to run faster?

Question

3 answers

solution1
6 2013-02-28 13:48:52

solution2
3 2013-02-28 13:54:21

solution3
0 2013-02-28 14:16:38

How can I optimize my query to run faster?

Question

3 answers

solution1 6 2013-02-28 13:48:52

solution2 3 2013-02-28 13:54:21

solution3 0 2013-02-28 14:16:38

solution1
6 2013-02-28 13:48:52

solution2
3 2013-02-28 13:54:21

solution3
0 2013-02-28 14:16:38