I have a dictionary in an MYSQL table, the table consists of 240 000 words. If I for example have the letters G , I , G , S , N and O I would like to select all the words in the table that contain all or some of these letters (and no other letters).
Acceptable words would for example include:
Examples of unacceptable words:
What would the MYSQL query look like?
My current MYSQL looks like:
SELECT * FROM `list`
WHERE word like '%S%' and word like '%O%' and word like '%G%'
I want to use 6 or 7 letters and find words that are:
Now I only find words that are equally long or longer and that contain other letters as well.
This is a starting point:
(I will insist that you construct the query from the letters you desire.)
If the column has only one word:
WHERE word REGEXP '^[GISNO]+$'
If the column can have multiple words, this will pick the row (but not the word), then before version 8.0:
WHERE word REGEXP '[[:<:]][GISNO]+[[:>:]]'
Or, with 8.0:
WHERE word REGEXP '\b[GISNO]+b'
Now to filter out "too many" of each letter. (I will assume the word is by itself in the column.)
AND word NOT REGEXP 'G.*G.*G' -- max of 2 Gs
AND word NOT REGEXP 'I.*I' -- max of 1 I
AND word NOT REGEXP 'O.*O' -- max of 1 O
AND word NOT REGEXP 'S.*S' -- max of 1 S
Another approach involves building an extra column with the letters alphabetized.
going ggino
song gnos
son nos
so os
on no
no no -- note the dup in the new column
Now the test becomes
WHERE sorted_word REGEXP '^g{0,2}i?n?o?s?$'
This should run somewhat faster.
And some other things may run faster using this trick.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.