简体   繁体   中英

Simple MySQL query is taking over two hours

So, I have a database with 5 million records and it stores the information for a question and answer site. The structure is...

question, qid, quserid, answer, auserid;

(qid is the id number for the question)

I'm trying to find all the users that have questions NOT answered by a particular user and sorting it by the amount of questions that weren't answered by a particular user. Here's my query:

SELECT quserid, COUNT(quserid)
FROM `qanda` 
WHERE qid NOT IN (SELECT qid FROM `qanda` WHERE auserid='myusername' GROUP BY qid) 
GROUP BY quserid 
ORDER BY COUNT(quserid) DESC 
LIMIT 0,1000;

The problem: it's taking over 2 hours and the clock is still ticking away! Anyone know how to speed this database up? I think there's a bug in the database or something, usually it only takes me 30 seconds for a simple query at most, so this is getting a bit ridiculous. Anyone know any tweaks? Possibly a simple change to the config file or something?

..........

Here's some data from the database I just copied and pasted. Sorry for the lack of formatting.

you could have any one person in the entire wor...  greendaystud    ive got the person i want...its great...because sh...   •glitter•rock•  191437  If you could have any one person in the entire wor...   just~another~slave2tears    i already got em
                            •glitter•rock•  191437  If you could have any one person in the entire wor...   korn_chick2007  matt or chris... i have feelings for them
                            •glitter•rock•  189555  why are you so sexy?
    just~another~slave2tears    my b/f says i am...i dun tink so tho
                            •glitter•rock•  189555  why are you so sexy?
    korn_chick2007  im not
                            •glitter•rock•  189555  why are you so sexy?
    MyKool-AidsSexy     i dont think i am
                            †brokengirl†    115228  If you are supposed to expect the unexpected,
doe...  death-tone  yip
                            †brokengirl†    115228  If you are supposed to expect the unexpected,
doe...  _doieverknowwhoiam_     you know whats weird? my friend sandy says that a ...
                            †brokengirl†    115228  If you are supposed to expect the unexpected,
doe...  Cute_Physco_kitty   Pretty much..
                            †brokengirl†    115228  If you are supposed to expect the unexpected,
doe...  Leslie02    WHAT! OK, now im confused!
                            †brokengirl†    114995  Why does my listerine taste like sausage this
mor...  death-tone  what's listerine?
                            †brokengirl†    114995  Why does my listerine taste like sausage this
mor...  _doieverknowwhoiam_     i don't know, and maybe it's jut me bu...
                            †brokengirl†    114995  Why does my listerine taste like sausage this
mor...  darksunofdeath  How old is the listerine pack?
                            †brokengirl†    114995  Why does my listerine taste like sausage this
mor...  Cute_Physco_kitty   uhh... New brand of Listerine?
    †brokengirl†    114995  Why does my listerine taste like sausage this
mor...  Leslie02    did you have sausage for breakfast?     †brokengirl†    104305  What should I name my pinky toe on my left
foot?¿...   death-tone  "Pinkytoe"        

And, the expected output, using convenient column titles...

Questioner User ID | Number of questions asked by the Questioner that were unanswered by 'myuserid'

Greenbay Packer | 6
DollyDoll | 63
PsychoticPokemon | 62
HelloKitty | 61
GreenDayFan | 60

...

IDontAskManyQuestion | 2<br>
WhatsAQuestion? | 1<br>

And here's the EXPLAIN output

> mysql-> EXPLAIN
>     ->
>     -> SELECT quserid, COUNT(quserID)
>     -> FROM `qanda`
>     -> WHERE qid NOT IN (SELECT qid FROM `qanda` WHERE auserid='boxocereal' GROU P BY qid)
>     -> GROUP BY quserid
>     -> ORDER BY COUNT(quserid) DESC
>     -> LIMIT 0,1000;
> 

+----+--------------------+-------+------+---------------+------+---------+-----
> -+---------+----------------------------------------------+ | id | select_type        | table | type | possible_keys | key  | key_len |
> ref  | rows    | Extra                                        |
> +----+--------------------+-------+------+---------------+------+---------+-----
> -+---------+----------------------------------------------+ |  1 | PRIMARY            | qanda | ALL  | NULL          | NULL | NULL    |
> NULL  | 3167995 | Using where; Using temporary; Using filesort | |  2
> | DEPENDENT SUBQUERY | qanda | ALL  | NULL          | NULL | NULL    |
> NULL  | 3167995 | Using where; Using temporary; Using filesort |
> +----+--------------------+-------+------+---------------+------+---------+-----
> -+---------+----------------------------------------------+ 2 rows in set (0.02 sec)
> 
> mysql->

Okay, I have an idea for translating this into a JOIN instead of a subquery. The technique is to actually JOIN where there is and answer from that user (since that is radically more efficient), and then just exclude those questions from the final results (in HAVING). It is possible that this could be improved further (I haven't tested, but the IS NULL check could possibly be moved into a WHERE clause instead).

Does this get you where you want to be?

SELECT question.quserid, COUNT(question.quserid) as num_questions
FROM qanda as question 
LEFT OUTER JOIN qanda as answers 
  ON question.qid = answers.qid AND answers.auserid = 'myusername'
GROUP BY question.quserid
ORDER BY num_questions DESC
HAVING answers.auserid IS NULL;

EDIT: a bit more explanation, in case this is close and someone else can help refine the idea into a solution.

Basically, the main part of the query will select every question exactly once (LEFT OUTER), and then for the questions which the user of interest has answered, it will also select the JOIN'd columns. This brings us to a state of have all questions in the candidate result set, with 'flags' attached to the ones we are not interested in (ie non-null joined columns). Then we group and reject the rows with non-null join'd data.

根据输出,如果正在进行全表扫描,请尝试找出可以实现索引的候选项。

With reference to @ctrahey answer I have changed query and got solution working as @ctrahey was not giving output as yours as shown here .

Could you try with and let me know if this is faster?

SELECT question.quserid, COUNT(question.quserid) as num_questions
FROM qanda as question 
LEFT OUTER JOIN qanda as answers 
  ON question.qid = answers.qid AND answers.auserid = 'user2'
WHERE answers.auserid IS NULL
GROUP BY question.quserid
ORDER BY num_questions DESC;

Also see sqlfiddle . Your query and above query are giving same output.

Hope this is what you wanted.

Try this one, too:

SELECT quserid, COUNT(*) AS num
FROM qanda
WHERE qid NOT IN 
        ( SELECT qid
          FROM qanda 
          WHERE auserid = 'user2' 
        )  
GROUP BY quserid 
ORDER BY num DESC 
LIMIT 0,1000 ;

You'll definitely need to add indices for any of these variants to be efficient.

Is the (qid, quserid, auserid) the PRIMARY key? Is it UNIQUE (I guess it should be one of the two.

An index on (quserid, qid) would also help (if your table uses the InnoDB engine).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM