简体   繁体   中英

Selecting Strings With Alphabetized Characters - In SQL Server 2008 R2

This is a recreational pursuit, and is not homework. If you value academic challenges, please read on.

A radio quiz show had a segment requesting listeners to call in with words that have their characters in alphabetical order, eg "aim", "abbot", "celt", "deft", etc. I got these few examples by a quick Notepad++ (NPP) inspection of a Scrabble dictionary word list.

I'm looking for an elegant way in T-SQL to determine if a word qulifies for the list, ie all its letters are in alpha order, case insensitive.

It seemed to me that there should be some kind of T-SQL algorithm possible that will do a SELECT on a table of English words and return the complete list of all words in the Srcabble dictionary that meets the spec. I've spent considerable time looking at regex strings, but haven't hit on anything that comes even remotely close. I've thought about the obvious looping scenario, but abandoned it for now as "inelegant". I'm looking for your ideas that will obtain the qualifying word list,

preferably using
- a REGEX expression
- a tally-table-based approach
- a scalar UDF that returns 1 if the input word meets the requirement, else 0.
- Other, only limited by your creativity.

But preferably NOT using
- a looping structure
- a recursive solution
- a CLR solution

Assumptions/observations:
1. A "word" is defined here as two or more characters. My dictionary shows 55 2-character words, of which only 28 qualify.
2. No word will have more than two concecutive characters that are identical. (If you find one, please point it out.)
3. At 21 characters, "electroencephalograms" is the longest word in my Scrabble dictionary
(though why that word is in the Scrabble dictionary escapes me--the board is only a 15-by-15 grid.)
Consider 21 as the upper limit on word length.
4. All words LIKE 'Z%' can be dismissed because all you can create is {'Z','ZZ', ... , 'ZZZ...Z'}.
5. As the dictionary's words' initial character proceedes through the alphabet, fewer words will qualify.
6. As the word lengths get longer, fewer words will qualify.
7. I suspect that there will be less than 0.2% of my dictionary's 60,387 words that will qualify.

For example, I've tried NPP regex searches like "^a[az][bz][bz][cz][cz][dz][dz][ez]" for 9-letter words starting with "a", but the character-by-character alphabetic enforcement is not handled properly. This search will return "abilities" which fails the test with the "i" that follows the "l".

There's several free Scrabble word lists available on the web, but Phil Factor gives a really interesting treatment of T-SQL/Scrabble considerations at https://www.simple-talk.com/sql/t-sql-programming/the-sql-of-scrabble-and-rapping/ which is where I got my word list.

Care to give it a shot?

Split the word into individual characters using a numbers table . Use the numbers as one set of indices. Use ROW_NUMBER to create another set. Compare the two sets of indices to see if they match for every character to see if they match. If they do, the letters in the word are in the alphabetical order.

DECLARE @Word varchar(100) = 'abbot';

WITH indexed AS (
  SELECT
    Index1 = n.Number,
    Index2 = ROW_NUMBER() OVER (ORDER BY x.Letter, n.Number),
    x.Letter
  FROM
    dbo.Numbers AS n
  CROSS APPLY
    (SELECT SUBSTRING(@Word, n.Number, 1)) AS x (Letter)
  WHERE
    n.Number BETWEEN 1 AND LEN(@Word)
)
SELECT
  Conclusion = CASE COUNT(NULLIF(Index1, Index2))
                 WHEN 0 THEN 'Alphabetical'
                 ELSE    'Not alphabetical'
               END
FROM
  indexed
;

The NULLIF(Index, Index2) expression does the comparison: it returns a NULL if the the arguments are equal, otherwise it returns the value of Index1 . If all indices match, all the results will be NULL and COUNT will return 0, which means the order of letters in the word was alphabetical.

Interesting idea...

Here's my take on it. This returns a list of words that are in order, but you could easily return 1 instead.

DECLARE @WORDS TABLE (VAL VARCHAR(MAX))
INSERT INTO @WORDS (VAL)
VALUES ('AIM'), ('ABBOT'), ('CELT'), ('DAVID')

;WITH CHARS
AS
(
    SELECT VAL AS SOURCEWORD, UPPER(VAL) AS EVALWORD, ASCII(LEFT(UPPER(VAL),1)) AS ASCIICODE, RIGHT(VAL,LEN(UPPER(VAL))-1) AS REMAINS, 1 AS ROWID, 1 AS INORDER, LEN(VAL) AS WORDLENGTH
    FROM @WORDS
    UNION ALL
    SELECT SOURCEWORD, REMAINS, ASCII(LEFT(REMAINS,1)), RIGHT(REMAINS,LEN(REMAINS)-1), ROWID+1, INORDER+CASE WHEN ASCII(LEFT(REMAINS,1)) >= ASCIICODE THEN 1 ELSE 0 END AS INORDER, WORDLENGTH
    FROM CHARS
    WHERE LEN(REMAINS)>=1
),
ONLYINORDER
AS 
(
    SELECT *
    FROM CHARS
    WHERE ROWID=WORDLENGTH AND INORDER=WORDLENGTH
)

SELECT SOURCEWORD
FROM ONLYINORDER

Here it is as a UDF:

CREATE FUNCTION dbo.AlphabetSoup (@Word VARCHAR(MAX))
RETURNS BIT
AS
BEGIN
    SET @WORD = UPPER(@WORD)

    DECLARE @RESULT INT
    ;WITH CHARS
    AS
    (
        SELECT @WORD AS SOURCEWORD, 
               @WORD AS EVALWORD, 
               ASCII(LEFT(@WORD,1)) AS ASCIICODE, 
               RIGHT(@WORD,LEN(@WORD)-1) AS REMAINS, 
               1 AS ROWID, 
               1 AS INORDER, 
               LEN(@WORD) AS WORDLENGTH
        UNION ALL
        SELECT SOURCEWORD, 
               REMAINS, 
               ASCII(LEFT(REMAINS,1)), 
               RIGHT(REMAINS,LEN(REMAINS)-1), 
               ROWID+1, 
               INORDER+CASE WHEN ASCII(LEFT(REMAINS,1)) >= ASCIICODE THEN 1 ELSE 0 END AS INORDER, 
               WORDLENGTH
        FROM CHARS
        WHERE LEN(REMAINS)>=1
    ),
    ONLYINORDER
    AS 
    (
        SELECT 1 AS RESULT
        FROM CHARS
        WHERE ROWID=WORDLENGTH AND INORDER=WORDLENGTH
        UNION
        SELECT 0
        FROM CHARS
        WHERE NOT (ROWID=WORDLENGTH AND INORDER=WORDLENGTH)
    )

    SELECT @RESULT = RESULT  FROM ONLYINORDER
    RETURN @RESULT
END

I did something similar to Andriy. I created a numbers table with value 1-21. I use it to create one set of data with the individual letters order by the index and the a second set ordered alphabetically. Joined the sets to each other on the letter and numbers. I then count nulls. Anything over 0 means it is not in order.

DECLARE @word VARCHAR(21) 

SET @word = 'abbot' 

SELECT Count(1) 
FROM   (SELECT Substring(@word, number, 1) AS Letter, 
           Row_number() OVER ( ORDER BY number)        AS letterNum 
        FROM   numbers 
        WHERE  number <= CONVERT(INT, Len(@word))) a 
LEFT OUTER JOIN (SELECT Substring(@word, number, 1) AS letter, 
                           Row_number() OVER ( ORDER BY Substring(@word, number, 1)) AS letterNum 
                    FROM   numbers 
                    WHERE  number <= CONVERT(INT, Len(@word))) b 
  ON a.letternum = b.letternum 
    AND a.letter = b.letter 
WHERE  b.letter IS NULL 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM