简体   繁体   中英

What's a good way for figuring out all possible words of a given length

I'm trying to create an algorithm in C# which produces the following output strings:

AAAA
AAAB
AAAC
...and so on...
ZZZX
ZZZY
ZZZZ

What is the best way to accomplish this?

public static IEnumerable<string> GetWords()
{
    //Perform algorithm
    yield return word;
}

well, if the length is a constant 4, then this would handle it:

public static IEnumerable<String> GetWords()
{
    for (Char c1 = 'A'; c1 <= 'Z'; c1++)
    {
        for (Char c2 = 'A'; c2 <= 'Z'; c2++)
        {
            for (Char c3 = 'A'; c3 <= 'Z'; c3++)
            {
                for (Char c4 = 'A'; c4 <= 'Z'; c4++)
                {
                    yield return "" + c1 + c2 + c3 + c4;
                }
            }
        }
    }
}

if the length is a parameter, this recursive solution would handle it:

public static IEnumerable<String> GetWords(Int32 length)
{
    if (length <= 0)
        yield break;

    for (Char c = 'A'; c <= 'Z'; c++)
    {
        if (length > 1)
        {
            foreach (String restWord in GetWords(length - 1))
                yield return c + restWord;
        }
        else
            yield return "" + c;
    }
}

There's always the obligatory LINQ implementation. Most likely rubbish performance, but since when did performance get in the way of using cool new features?

var letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ".ToCharArray();

var sequence = from one in letters
               from two in letters
               from three in letters
               from four in letters
               orderby one, two, three, four
               select new string(new[] { one, two, three, four });

'sequence' will now be an IQueryable that contains AAAA to ZZZZ.

Edit:

Ok, so it was bugging me that it should be possible to make a sequence of configurable length with a configurable alphabet using LINQ. So here it is. Again, completely pointless but it was bugging me.

public void Nonsense()
{
    var letters = new[]{"A","B","C","D","E","F",
                        "G","H","I","J","K","L",
                        "M","N","O","P","Q","R","S",
                        "T","U","V","W","X","Y","Z"};

    foreach (var val in Sequence(letters, 4))
        Console.WriteLine(val);
}

private IQueryable<string> Sequence(string[] alphabet, int size)
{
    // create the first level
    var sequence = alphabet.AsQueryable();

    // add each subsequent level
    for (var i = 1; i < size; i++)
        sequence = AddLevel(sequence, alphabet);

    return from value in sequence
           orderby value
           select value;
}

private IQueryable<string> AddLevel(IQueryable<string> current, string[] characters)
{
    return from one in current
           from character in characters
           select one + character;
}

The call to the Sequence method produces the same AAAA to ZZZZ list as before but now you can change the dictionary used and how long the produced words will be.

Just a coment to Garry Shutler, but I want code coloring:

You really don't need to make it IQuaryable, neither the sort, so you can remove the second method. One step forwad is to use Aggregate for the cross product, it end up like this:

IEnumerable<string> letters = new[]{
                "A","B","C","D","E","F",                       
                "G","H","I","J","K","L",
                "M","N","O","P","Q","R","S",           
                "T","U","V","W","X","Y","Z"};

var result = Enumerable.Range(0, 4)
                .Aggregate(letters, (curr, i) => curr.SelectMany(s => letters, (s, c) => s + c));

foreach (var val in result)
     Console.WriteLine(val);

Anders should get a Nobel prize for the Linq thing!

GNU Bash!

{a..z}{a..z}{a..z}{a..z}

Inspired by Garry Shutler's answer, I decided to recode his answer in T-SQL.

Say "Letters" is a table with only one field, MyChar, a CHAR(1). It has 26 rows, each an alphabet's letter. So we'd have (you can copy-paste this code on SQL Server and run as-is to see it in action):

DECLARE @Letters TABLE (
    MyChar CHAR(1) PRIMARY KEY
)
DECLARE @N INT
SET @N=0
WHILE @N<26 BEGIN
    INSERT @Letters (MyChar) VALUES ( CHAR( @N + 65) )
    SET @N = @N + 1
END
-- SELECT * FROM @Letters ORDER BY 1

SELECT A.MyChar, B.MyChar, C.MyChar, D.MyChar
FROM @Letters A, Letters B, Letters C, Letters D
ORDER BY 1,2,3,4

The advantages are: It's easily extensible into using capital/lowercase, or using non-English Latin characters (think "Ñ" or cedille, eszets and the like) and you'd still be getting an ordered set, only need to add a collation. Plus SQL Server will execute this slightly faster than LINQ on a single core machine, on multicore (or multiprocessors) the execution can be in parallel, getting even more boost.

Unfortunately, it's stuck for the 4 letters specific case. lassevk's recursive solution is more general, trying to do a general solution in T-SQL would necessarily imply dynamic SQL with all its dangers.

Python!

(This is only a hack, dont' take me too seriously :-)

# Convert a number to the base 26 using [A-Z] as the cyphers
def itoa26(n): 
   array = [] 
   while n: 
      lowestDigit = n % 26
      array.append(chr(lowestDigit + ord('A'))) 
      n /= 26 
   array.reverse() 
   return ''.join(array)

def generateSequences(nChars):
   for n in xrange(26**nChars):
      string = itoa26(n)
      yield 'A'*(nChars - len(string)) + string

for string in generateSequences(3):
   print string

Haskell!

replicateM 4 ['A'..'Z']

Ruby!

('A'*4..'Z'*4).to_a

This is an recursive version of the same functions in C#:

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;

namespace ConsoleApplication1Test
{
    class Program
    {
        static char[] my_func( char[] my_chars, int level)
        {
            if (level > 1)
                my_func(my_chars, level - 1);
            my_chars[(my_chars.Length - level)]++;
            if (my_chars[(my_chars.Length - level)] == ('Z' + 1))
            {
                my_chars[(my_chars.Length - level)] = 'A';
                return my_chars;
            }
            else
            {
                Console.Out.WriteLine(my_chars);
                return my_func(my_chars, level);
            }
        }
        static void Main(string[] args)
        {
            char[] text = { 'A', 'A', 'A', 'A' };
            my_func(text,text.Length);
            Console.ReadKey();
        }
    }
}

Prints out from AAAA to ZZZZ

Simpler Python!

def getWords(length=3):
    if length == 0: raise StopIteration
    for letter in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ':
        if length == 1: yield letter
        else:
            for partialWord in getWords(length-1):
                yield letter+partialWord

javascript!

var chars = 4, abc = "ABCDEFGHIJKLMNOPQRSTUVWXYZ", top = 1, fact = [];
for (i = 0; i < chars; i++) { fact.unshift(top); top *= abc.length; }
for (i = 0; i < top; i++)
{
    for (j = 0; j < chars; j++) 
        document.write(abc[Math.floor(i/fact[j]) % abc.length]);
    document.write("<br \>\n");
}

Use something which automatically Googles for every single letter combination, then see if there are more ".sz" or ".af" hits then ".com" hits at the five first results ... ;)

Seriously, what you're looking for might be Tries (data structure) though you still need to populate the thing which probably is far harder...

A very simple but awesome code that generates all words of 3 and 4 letters of english language

#include <iostream>
using namespace std;
char alpha[26]={'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'}
int main() {
int num;
cin >> num;
if (num == 3) { //all 3 letter words
    for (int i = 0; i <= 25; i++) {
        for (int o = 0; o <= 25; o++) {
            for (int p = 0; p <= 25; p++) {
                cout << alpha[i] << alpha[o] << alpha[p] << " ";
            }
        }
    }
}
else if (num == 4) { //all 4 letter words
    for (int i = 0; i <= 25; i++) {
        for (int o = 0; o <= 25; o++) {
            for (int p = 0; p <= 25; p++) {
                for (int q = 0; q <= 25; q++) {
                    cout << alpha[i] << alpha[o] << alpha[p] << alpha[q] << " ";
                }
            }
        }
    }
}
else {
    cout << "Not more than 4"; //it will take more than 2 hours for generating all 5 letter words
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM