简体   繁体   中英

Ignore spaces when reading from a file using get lines in Scala

I am trying to read inputs from a file and counts them using a map.I want to ignore spaces when reading from the file.

val lines = Source.fromFile("file path","utf-8").getLines()

val counts = new collection.mutable.HashMap[String, Int].withDefaultValue(0)
lines.flatMap(line => line.split(" ")).foreach(word => counts(word) += 1)
for ((key, value) <- counts) println (key + "-->" + value)

When I try this code for the following input.

hello hello
    world goodbye hello
  world

the output is

world-->2
goodbye-->1
hello-->3
-->2

it counts 2 spaces. how can I fix that ?

lines.flatMap(_.trim.split("\\s+"))

Probably one way would be to use filter:

lines
  .flatMap(line => line.split(" "))
  .filter(_ != " ")
  .foreach(word => counts(word) += 1)

Anyway I would say that there's a better approach, you could force the iterator to evaluate using the toList method and then use groupBy with collect :

Iterator("some  word", "some    other")
  .flatMap(_.split(" "))
  .toList
  .groupBy(identity)
  .collect { case (a,b) if !a.isEmpty => (a, b.length)}

This outputs:

Map(some -> 2, word -> 1, other -> 1)

Note also that this approach is most probably less efficient than the one you are using because it creates many intermediate collections, I haven't done any benchmark on it, for large files it may be not the best option.

This approach extracts words from each line with "\\\\W+" , regardless of the number of white spaces in between words,

Source.fromFile("filepath")
  .getLines
  .flatMap(_.trim.split("\\W+"))
  .toArray.groupBy(identity)
  .map ( kv => kv._1 -> kv._2.size )

Hence

res: Map(world -> 2, goodbye -> 1, hello -> 3)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM