简体   繁体   中英

scala printing word length histogram scala

I am taking in as input a set of lines and keep track of the distribution of word lengths. Extra white space including newlines in the input does not matter. After the end of the input is reached, output is a text-based histogram of the distribution of word lengths: eg : "Hey How are you hey am good"

Ouput : 1 - 0, 2 - 1, 3 - 5, 4 - 1, 5 - 0

where(first char is length of words and second is no of words of that length). I have written

val lines = scala.io.Source.stdin.getLines
val words = lines.flatMap(_.split("\\W+"))

I want to group words of same length and then store them in a iterator or map

val list2 = words.groupby(e.length => e.length).mapValues(_.length) 

does not give me the desired result. Any suggestions?

You've pretty much got it, but you need groupBy(e => e.length) . The left side ( e ) of the anonymous function ( e => e.length ), should be a variable name that will be used for each item in the collection (ie, each word). So e is a word, and we group according to the lengths of the words.

(Also, groupBy has a capital "B").

val list2 = words.groupBy(e => e.length).mapValues(_.length)

If you want the output like you described, you could follow it up with:

val vectorOfLengths = (1 to list2.keys.max).map(length => list2.getOrElse(length, 0))
// Vector(0, 1, 5, 1)
println(vectorOfLengths.zipWithIndex.map{case (count, length) => f"${length+1} - $count" }.mkString(", "))
// 1 - 0, 2 - 1, 3 - 5, 4 - 1

Or, hey, how about visually?

for ((count, length) <- vectorOfLengths.zipWithIndex)
  println(f"${length+1}: ${"#" * count}")
//   1:
//   2: #
//   3: #####
//   4: #

Just for fun, how about a visual histogram of Alice in Wonderland ?

val aliceLines = io.Source.fromFile("/Users/dhg/texts/alice.txt").getLines.toVector
val aliceWords = aliceLines.flatMap(_.split("\\W+"))
val aliceHist = aliceWords.groupBy(_.length).mapValues(_.length)
val aliceLengths = (1 to aliceHist.keys.max).map(aliceHist.getOrElse(_, 0))
for ((count, length) <- aliceLengths.zipWithIndex)
  println(f"${length+1}%2s: ${"#" * (count/100)}")

//     1: ###################
//     2: ##################################################
//     3: ############################################################################
//     4: #############################################################
//     5: ###################################
//     6: ######################
//     7: ##################
//     8: ########
//     9: ######
//    10: ###
//    11: #
//    12:
//    13:
//    14:
//    15:
//    16:

Well, Scala collections provide groupBy method, in case of Seq it looks like this:

def groupBy[K](f: (A) ⇒ K): immutable.Map[K, Seq[A]]

which means it applies a function to every element in the list and groups them based on the result. To group words by their length, the function should take a string and return it length:

//words: Seq[String] = Seq(a, b, c, dd, eee, fff)
val byLength = words.groupBy{(w:String) => w.length}//Map(2 -> Seq(dd), 
                                                    //    1 -> Seq(a, b, c), 
                                                    //    3 -> Seq(eee, fff))

Or you may put it a bit shorter, omitting parameter type declaration, the compiler will understand you:

val byLength = words.groupBy(w => w.length)

or even defining the anonymous function with an underscore placeholder:

val byLength = words.groupBy(_.length) //same thing

Now you can get words with specified length:

val singleCharacterWords = byLength(1) //Seq(a, b, c)

or check if the map contains some length

byLength.contains(1) //true
byLength.contains(5) //false

or iterate through all keys:

byLength.foreach{
  case (length:Int, wordsGroup:Seq[String]) => 
    println(s"Words with length $length : ${wordsGroup.mkString(" ")}")
}
//Words with length 2 : dd
//Words with length 1 : b c
//Words with length 3 : eee fff

see Map for that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM