简体   繁体   中英

haskell words from alphabet of a given length

I have this function that generates a list of all words with a min length 0 and max length n, equals given as an input to the function:

import Data.List

words :: Int -> String -> [String]
words 0 alph = [[]]
words n alph = words (n-1) alph ++ [ ch:w | w <-words (n-1) alph, ch <- alph]

When I run this, the output is following:

> words 3 "AB"
["","A","B","A","B","AA","BA","AB","BB","A","B","AA","BA","AB","BB","AA","BA","AB","BB","AAA","BAA","ABA","BBA","AAB","BAB","ABB","BBB"]

The problem here is, that there are some words repeating, in this example, especially the words of length of 2 ("AA" is 3 times there). Can you see what am I doing wrong in my function, or do you have any idea how to solve it?

This is because the words (n-1) alph in the list comprehension does not only yield words of length n-1 but also n-2 , n-3 , etc., since that is how you defined the words function.

It might be better to make a helper function that only generates words of length n and then use that in an extra function that constructs strings with lengths up to n :

words :: Int -> String -> [String]
words 0 alph = [[]]
words n alph = [ ch:w | w <-words (n-1) alph, ch <- alph]

wordsUpTo :: Int -> String -> [String]
wordsUpTo n alph = concatMap (flip words alph) [0 .. n]

However words already exists, this is just a special case of replicateM :: Applicative m = > Int -> ma -> m [a] , so we can write this as:

import Control.Monad(replicateM)

wordsUpTo :: Int -> String -> [String]
wordsUpTo n alph = [0 .. n] >>= (`replicateM` alph)

which will produce:

Prelude Control.Monad> wordsUpTo 3 "AB"
["","A","B","AA","AB","BA","BB","AAA","AAB","ABA","ABB","BAA","BAB","BBA","BBB"]

The Applicative instance for lists effectively computes a cross-product,

> (,) <$> ["A", "B"] <*> ["C", "D"]
[("A","C"),("A","D"),("B","C"),("B","D")]

elements of which can be joined with (++) instead of (,) :

> (++) <$> ["A", "B"] <*> ["C", "D"]
["AC","AD","BC","BD"]

If you repeatedly apply this operation, you'll get the strings you want:

> (++) <$> ["A", "B"] <*> [""] -- base case
["A","B"]
> (++) <$> ["A", "B"] <*> ["A","B"]
["AA","AB","BA","BB"]
> (++) <$> ["A", "B"] <*> ["AA","AB","BA","BB"]
["AAA","AAB","ABA","ABB","BAA","BAB","BBA","BBB"]

The function you want to repeat is the section ((++) <$> ["A", "B"] <*>) , which from now on we'll refer to as f :

> f = ((++) <$> ["A", "B"] <*>)

This repeated application is captured by the iterate function, which repeatedly feeds the output of one function application as the input of the next.

> take 3 $ iterate f [""]
[[""],["A","B"],["AA","AB","BA","BB"]]

We'll want to concatenate the results into a single list:

> take 7 $ concat $ iterate f [""]
["","A","B","AA","AB","BA","BB"]

So all combinations is just

allWords alph = concat $ iterate f [""]
  where f = ((++) <$> alph <*>)

To get the elements with some maximum length, we can either

  1. Use takeWhile (\\x -> length x <= n) , or
  2. Use take (2^(n+1) - 1) (given the order in which items are generated, all the strings of a given length occur before longer strings, and we can compute the total number of strings with a given maximum length)

So we can define either

words n = takeWhile p . allWords
  where p x = length x < 4

or

words n = take n' . allWords

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM