简体   繁体   中英

Select MYSQL data (words) that contain certain characters

I have a dictionary in an MYSQL table, the table consists of 240 000 words. If I for example have the letters G , I , G , S , N and O I would like to select all the words in the table that contain all or some of these letters (and no other letters).

Acceptable words would for example include:

  • going
  • song
  • son
  • so
  • on

Examples of unacceptable words:

  • songs (one S more than allowed)
  • words longer that the number of characters (6 in this case)

What would the MYSQL query look like?

My current MYSQL looks like:

SELECT * FROM `list` 
WHERE word like '%S%' and word like '%O%' and word like '%G%'

I want to use 6 or 7 letters and find words that are:

  • equally long
  • shorter

Now I only find words that are equally long or longer and that contain other letters as well.

This is a starting point:

(I will insist that you construct the query from the letters you desire.)

If the column has only one word:

WHERE word REGEXP '^[GISNO]+$'

If the column can have multiple words, this will pick the row (but not the word), then before version 8.0:

WHERE word REGEXP '[[:<:]][GISNO]+[[:>:]]'

Or, with 8.0:

WHERE word REGEXP '\b[GISNO]+b'

Now to filter out "too many" of each letter. (I will assume the word is by itself in the column.)

  AND word NOT REGEXP 'G.*G.*G'   -- max of 2 Gs
  AND word NOT REGEXP 'I.*I'      -- max of 1 I
  AND word NOT REGEXP 'O.*O'      -- max of 1 O
  AND word NOT REGEXP 'S.*S'      -- max of 1 S

Another approach involves building an extra column with the letters alphabetized.

going   ggino
song    gnos
son     nos
so      os
on      no
no      no   -- note the dup in the new column

Now the test becomes

WHERE sorted_word REGEXP '^g{0,2}i?n?o?s?$'

This should run somewhat faster.

And some other things may run faster using this trick.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM